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Abstract 



Influence systems form a large class of multiagent systems designed to model 
how influence, broadly defined, spreads across a dynamic network. We build a 
general analytical framework which we then use to prove that, while sometimes 
i chaotic, influence dynamics of the diffusive kind is almost always asymptotically 

^ periodic. Besides resolving the dynamics of a popular family of multiagent systems, 

the other contribution of this work is to introduce a new type of renormalization- 
based bifurcation analysis for multiagent systems. 

o 

< 

1 Introduction 

* i-H 
rH 

The contribution of this paper is twofold: (i) to formulate an "algorithmic calculus" 
for continuous, discrete-time multiagent systems; and (ii) to resolve the behavior of a 
popular type of social dynamics that had long resisted analysis. In the process, we also 
\& introduce a new approach to time-varying Markov chains. Diffusive influence systems 

are piecewise-linear dynamical systems x i-> P(x)x, which are specified by a piecewise- 
constant function P mapping any x € W l to an n-by-n stochastic matrix P(x). We 
^1 prove that, while sometimes chaotic, such systems are almost surely attracted to a fixed 

CD point or a limit cycle. 

As in statistical mechanics, the difficulty of analyzing influence systems comes from 
the tension between two opposing forces: one, caused by the map's discontinuities, is 
. ^ "entropic" and leads to chaos; the other one, related to the Lyapunov exponents, is "en- 

^ ergetic" and pulls the system toward an attracting manifold within which the dynamics 

is periodic. The challenge is to show that, outside a vanishingly small critical region in 
parameter space, entropy always loses. Because the interaction topology changes all the 
time (endogenously) , the proof relies heavily on an algorithmic framework to monitor 
the flow of information across the system. As a result, this work is, at its core, an algo- 
rithmic study in dynamic networks. Influence systems include finite Markov chains as a 
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special case but the differences are deep and far-reaching: whereas Markov chains have 
predictable dynamics, influence systems can be chaotic even for small n; whereas the 
convergence of a Markov chain can be checked in polynomial time, the convergence of 
an influence system is undecidable. Our main result is that this bewildering complexity 
is in fact confined to a vanishing region of parameter space. Typically, influence systems 
are asymptotically periodic. 



Influence and social dynamics. There is a context to this work and this is where 
we begin. An overarching ambition of social dynamics is to understand and predict the 
collective behavior of agents influencing one another across an endogenously changing 
network |13| . HK systems have emerged in the last decade as a prototypical platform 
for such investigations (^§[^[^[^[33j[34j[36}[37[[39] . To unify its varied strands (eg, 
bounded-confidence, bounded-influence, truth-seeking, Friedkin-Johnsen type, deliber- 
ative exchange) into a single framework and supply closed-loop analogs to standard 
consensus models [4, 35,40], we introduce influence systems. These are discrete-time dy- 
namical systems x i— > /(x) in (IR rf ) n : each "coordinate" Xi of the state x = (xi, . . . , x n ) is 
a (i-tuple encoding the location of agent i as a point in M^; with any state x is associated 
a directed graph <5(x) with the n agents as nodes. Each coordinate function f of the 
map / = (/i, . . . , f n ) takes as input the neighbors of agent i in <5(x) and outputs the 
new location fi(x) of agent i in d-space. One should think of agent i as a "computer" 
and Xi as its "memory." All influence systems in this work will be assumed to be diffu- 
sive, meaning that at each step an agent may move only within the convex hull of its 
neighbors^] Note that the system x i— > P(x)x in the opening paragraph corresponds to 
the one-dimensional case. 

Influence systems arise in processes as diverse as chemotaxis, synchronization, opin- 
ion dynamics, flocking, swarming, and rational social learning^] Typically, a natural 
algorithm directs n autonomous agents to obey two sets of rules: (i) one of them de- 
termines, on the basis of the system's current state x, which agent communicates with 
which one; (ii) the other one specifies how an agent updates its state by processing 
the information it receives from its neighbors. Diffusive influence systems are central 
to social dynamics insofar as they extend the fundamental concept of diffusion to au- 
tonomous agents operating within dynamic, heterogeneous environments^ This stands 
in sharp contrast with the classic brand of diffusion found in physics and chemistry, 



1 This is a standard assumption meant to ensure that co nsensus is a fixed poi nt 
The states of an influence system can be: opinions [6l[23lp6l|3l]|33l|35|r 



phases p4p6U48 



cell suspensions 
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neuronal spiking sequences 15 , animal herd locations [20], consensus values [12j|40 



ries [25] , cell populations 47 , schooling fish velocit ies |43||45| , sensor networks data 10 
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, Bayesian beliefs [l], 
swarming trajecto- 
synchronization 



, heart pacemaker cell signals [52||56] , cricket chirpings [55], firefly flashing s [38], yeast 
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or flocking headings [3 
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, microwave oscillator frequencies 
3 For a fanciful but illustrative example, imagine n insects on the ground (a~= 2), each one moving 
toward the mass center of its neighbors. Each one gets to "choose" who is its neighbor: this cricket 
picks the five ants closest to it within its cone of vision; that spider goes for the ladybugs within two 
feet; these ants select the 10 furthest termites; etc. Once the insects have determined their neighbors, 
they move to their mass center (or a weighted version of it). This is repeated forever. 
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which assumes passive particles subject to identical laws. Naturally, influence systems 
are "downward-compatible" and can model standard (discrete) diffusion as well. They 
also allow exogeneities (eg, diffusion-reaction) via the addition of special-purpose agents. 
Autonomy and heterogeneity are the defining features of influence systems: they grant 
agents the freedom to have their own, distinct decision procedures to choose their neigh- 
bors as they please and act on the information collected from them. This explains their 
ubiquity among natural algorithms. 



The model. In a diffusive influence system, /(x) = (-P(x) (8) Id) x, where -P(x) is 
a stochastic matrix whose positive entries correspond to the edges of <5(x) and are 
rationals larger than some arbitrarily small p > 0. We form the Kronecker product with 
the d-by-d identity matrix to perform the averaging along each coordinate axis. We 
grant the agents a measure of self-confidence by adding a self- loop to each node of <5(x). 
Agent i computes the i-ih row of P(x) by means of its own algebraic decision tree; that 
is, on the basis of the signs of a finite number of cfn-variate polynomials evaluated at 
the coordinates of x. This high level of generality allows G(x) to be specified by any 
first-order sentence over the realsQ In a recent bird flocking model Jjjj, for instance, 
the communication graph joins every agent to its 7 nearest neighbors. We show below 
how to reduce the dimension to d = 1 and linearize the system so that -P(x) = P c , 
for any x 6 c, where c is any atom (open n-cell) of an arrangement of hyperplanes 
in R n , called the switching partition (SP). An influence system is called bidirectional 
if Qij = Qji (with Q = (Gij)), which implies that <7(x) is undirected. Such a system 
is further called metrical if Qij is solely a function of \xi — Xj\. Homogeneous HK 
systems |26,27,31 constitute the canonical example of a metrical system. We assume 
that all the relevant parameters (matrix entries, number and coefficients of hyperplanes, 
p, etc) can be encoded as rationals over O(logn) bits: this assumption can be freely 
relaxed — in fact, the bit lengths can be arbitrarily large as a function of n — and is only 
made to simplify the notation. 



Past work and present contribution. Beginning with their introduction by Son- 
tag [5l], piecewise-linear systems have become the subject of an abundant literature, 
which we do not attempt to review here. Restricting ourselves to influence systems, 
we note that the bidirectional kind are known to be attracted to a fixed point while 
expending a total s- energy at most exponential in the number of agents and polyno- 
mial in the reversible case 17 2311281135 , 40 . Convergence times are known only in the 
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In the nonbidirectional case, most convergence results are con- 
44 ,53 1 PI The standard assumption is that some form of joint 



41 



ditional 

connectivity property should hold in perpetuity. To check such a property is in general 



4 This is the language of geometry and algebra with statements specified by any number of quantifiers 
and polynomial (in)equalities. It was shown to be decidable by Tarski and amenable to quantifier 
elimination and algebraic cell decomposition by Collins 
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As they should be, since convergence is not assured. An exception is truth-seeking HK systems 



which have been shown to converge unconditionally 17 . 
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undecidable (see why below), so these convergence results are somewhat of a heuristic 
nature. A significant recent advance was Bruin and Deane's unconditional resolution of 
planar piecewise contractions, which are special kinds of influence systems with a single 
mobile agent 19). Our main result can be interpreted as a grand generalization of theirs. 

Theorem 1.1. Given any initial state, the orbit of an influence system is attracted 
exponentially fast to a limit cycle almost surely under an arbitrarily small perturbation. 
The period and preperiod are bounded by a polynomial in the reciprocal of the failure 
probability. Without perturbation, the system can be Turing-complete. In the bidirec- 
tional case, the system is attracted to a fixed point in time n ^ log | almost surely, 
where n is the number of agents and e is the distance to the fixed point. 

The theorem bounds the convergence time of bidirectional systems by a single expo- 
nential and establishes the asymptotic periodicity of generic influence systems. These 
results are essentially optimal. We also estimate the attraction rate of general systems 
but the bounds we obtain are probably too conservative to be useful. Perturbing the 
system means replacing each hyperplane a T x = oq of the SP by a T x = oq + 8, for 
some arbitrarily small random 5. Note that neither the initial state nor the transition 
matrices are perturbed^ We enforce an agreement rule, which sets Qij to be constant 
over the microscopic slab \xi — Xj\ < £q, for an arbitrarily small Eq > Intuitively, 
the agreement rule stipulates that minute fluctuations of opinion between two agents 
otherwise in full agreement should have no macroscopic effect on the system]^] We em- 
phasize that both the perturbation and the agreement rule are necessary: without them, 
the attraction claims of Theorem |l.l| are provably falsej^] We show that finely tuned 
influence systems are indeed Turing-complete. 

Our work resolves the long-term behavior of a fundamental natural process which 
includes the extended family of HK systems as a special case. The high generality of our 
results precludes statements about particular restrictions which might be easier. A good 
candidate for further investigation is the heterogeneous bounded- confidence model, where 



each Qij is defined by a single interval, and which is conjectured to converge 39 . (We 
show below that this is false if the averaging is not perfectly uniform.) Such systems were 
not even known to be periodic, a feature that our result implies automatically. Generally, 
our work exposes a surprising gap in the expressivity of directed and undirected dynamic 
networks: while the latter always lead to stable agreement (of a consensual, polarized, 
or fragmented nature), directed graphs offer a much richer complexity landscape. 

The second contribution of this work is the introduction of a new brand of bifurcation 
analysis based on algorithmic renormalization. In a nutshell, we use a graph algorithm 



6 This is not a noise model JS]: the perturbation happens only once at the beginning. 

7 Agent i is free to set the function Qij to either or 1. For notational convenience, we set so to be 
n -°W j smaller values would work just the same. 

8 Interestingly, this is precisely meant to prevent the "narcissism of small differences," identified by 
Freud and others as a common source of social conflicts. 

9 In the nonbidirectional case, agents are made to enforce a timeout mechanism to prevent edges 
from reappearing after an indefinite absence of unbounded length. While probably unnecessary, this 
minor technical feature seems to simplify the proof. 
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to decompose a dynamical system into a hierarchy of recursively defined subsystems. 
We then develop a tensor calculus to "compile" the graph algorithm into a bifurcation 
analysis. The tension between energy and entropy is then reduced to a question in 
matrix rigidity. 

In the context of social dynamics, Theorem 1.1 might be somewhat disconcerting. 
Influence systems model how people change opinions over time as a result of human 
interaction and knowledge acquisition. Our results show that, unless people keep varying 
the modalities of their interactions, as mediated by trust levels, self-confidence, etc, they 
will be caught forever recycling the same opinions in the same order. The saving grace 
is that the period can be exponentially long, so the social agents might not even realize 
they have become little more than a clock... 



2 The Complexity of Influence Systems 

Piecewise-linear systems are known to be Turing-complete [2j[7j[30j[50] . A typical sim- 
ulation relies on the existence of Lyapunov exponents of both signs, negative ones to 
move the head in one direction and positive ones to move it the other way. Influence 
systems have no positive exponents and yet are Turing-complete, as we show below. In 
dynamics, chaos is typically associated with positive topological entropy, which entails 
expansion, hence positive Lyapunov exponents. But piecewise linearity blurs this picture 
and produces surprises. For example, isometries (with only null Lyapunov exponents) 
are not chaotic [ll] but, paradoxically, contractions (with only negative exponents) can 
be (32]. Influence systems, which, with only null and negative Lyapunov exponents, sit 
in the middle, can be chaotic. The spectral lens seems to break down completely in the 
face of piecewise linearity! 



Exponential periods. It is an easy exercise to use higher bit lengths to increase 
the period of an oscillating influence system by any amount. More interesting is the 
observation that the period can be raised to exponential with only logarithmic bit length. 
We simulate a counter modulo 2 by building a system with n = 3: the first two agents 
are fixed at and 3 while the third oscillates between positions 1 and 2; this is trivially 
achieved with a two-test linear decision tree. Add another mobile agent oscillating 
between 1 and 2 like the previous one, but which moves only when the first oscillating 
agent is at position 1. (Adding a single test makes this possible.) Iterating in this 
fashion produces an n-agent influence system with 0(n) tests whose period is exactly 
2 n_2 . 

For a system where action and control are more closely mixed, consider implementing 
Z/2Z as a 3-agent influence system by fixing the first two agents at positions and 3, 
respectively, and then letting the third one oscillate between 1 and 2. By adding 0{q) 
discontinuities, we extend this scheme to keep an agent cycling through 1, . . . , q, and 
then back to 1, thus implementing 7Ljq7L. Repeating this construction for the first k 
primes p\ < ■ ■ ■ < Pk allows us to implement the system based on the direct sum 
Z/piZ © • • • © Z/pfcZ, which has period of Ylj <k Pj for a total of N = 0{p\ + • • • + pu) 
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agents and discontinuities. By the prime number theorem [42] , this gives us a period of 
length 2 n ^ Nlo ^ N \ While the period of the system as a whole is huge, each agent cycles 
through a short periodic orbit: this is easily remedied by adding another agent attracted 
to the mass center of the k cycling agents. By the Chinese Remainder Theorem, that 
last agent has an exponential period and acts as a sluggish clock. 



Chaos and perturbation. Perturbation is needed for several reasons, including uni- 
form bounds on the time to stationarity. We focus here on the agreement rule and show 
why it is necessary by designing a chaotic system that is resistant to perturbation. We 
use a total of four agents. The first two agents stay on opposite sides of 0.5, with the 
one further from 0.5 moving toward it: 

, . | ^ 1 | ( 2xi, x 1 + x 2 ) ifxi+x 2 >l 

2 | (xi + £2,222) else. 

The two agents converge toward 0.5 but the order in which they proceed (ie, their 
symbolic dynamics) is chaotic. To turn this into actual chaos, we introduce a third 
agent, which oscillates between a fourth agent fixed at 24 = and x\ (which is roughly 
0.5), depending on the order in which the first two agents move: 23 h-» ^(23 + 221) 
if 21 + 22 > 1 and 23 h-» ^(23 + 224) else. Assume that 21 (0) < 5 < 22(0) and 
consider the trajectory of a line L: X2 — \ = u(X\ — A), for u < 0. If the point 
(xi(t), 22 (t)) is on the line, then xi(t) + X2(t) > 1 implies that u < — 1 and L is mapped 
to X2 — \ = \{u + 1)(Xl — \)] if x\(t) + X2(t) < 1, then u > — 1 and L becomes 

The parameter u obeys the dynamics: mh) + 1) ifn< —1 and u4 2u/(u + 1) if 
— 1 < u < 0. Writing u = (v + l)/(v — 1) gives v \— 7-2v + lifu<0 and v 1— > 2v — 1 else. 
(Geometrically, v is the tangent of the angle between L and the line X + Y = 0.) The 
system v escapes for \v (0)| > 1 and otherwise conjugates with the baker's map 22 via 
the variable change: v = 2w — 1. Agent 3 is either at most 1/6 or at least 1/3 depending 
on which of agent 1 or 2 moves. This implies that the system has positive topological 
entropy: to know where agent 3 is at time t requires on the order of t bits of accuracy in 
the initial state. The cause of chaos is the first two agents' convergence toward the SP 
discontinuity. It is immediate that no perturbation can prevent this, so the agreement 
rule is indeed needed. 



Turing completeness. Absent perturbation and the agreement rule, an influence 
system can simulate a piecewise-linear system and hence a Turing machine. Here is 
how. Given a nonzero ra-by-n real- valued matrix A, let A + (resp. A~) be the matrix 
obtained by zeroing out the negative entries of A (resp. —A), so that A = A + — A~ . 
Define the matrices 

(B (Id -B)l 0\ 
and C = 1 , 

\0 l-p p) 
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where p is the reciprocal of the maximum row-sum in the matrix derived from A by 
taking absolute values. It is immediate that C is stochastic and conjugates with the 
dynamics of A. Indeed, given x £ M n , if x denotes the (2n + 2)-dimensional column 
vector (x, — x, 0, 1), then Cx = pAx; hence the commutative diagram: 

x > Ax 



x > p l Cx. 

Imagine now a piecewise-linear system consisting of a number of matrices {A} and an 
SP. We add n negated clones to the existing set of n agents, plus a stochasticity agent 
permanently positioned at x_i = and a projectivity agent initially at xq. This allows us 
to form the vector x = (x, — x, X—i, xo). The system scales down, so we must projectify 
the iSPby rewriting with homogeneous coordinates any a T x = ao as a T x = cioXq. We can 
use the same value of p throughout by picking the smallest one among all the matrices 
A used in the piecewise-linear system. 

Koiran et al 30 have shown how to simulate a Turing machine with a 3-agent 
piecewise-linear system, so we set n = 3. We need an output agent to indicate whether 
the system is in an accepting state: this is done by pointing to one of two fixed agents. 
We can enlist one of the three original agents for that purpose, which brings the agent 
count up to 2n + 3 = 9. Predicting basic state properties of an influence system is 
therefore undecidable. With a few more agents, we can easily encode as an undecidable 
question whether the communication graphs (or their union over bounded time windows) 
satisfy certain connectivity property infinitely often. 



Linearization. We show how to linearize an influence system by tensor powering (for 
any d). Let d be the maximum total degree of the polynomial tests used in the algebraic 
decision trees (recall that each agent comes equipped with its own). We can always 
assume the existence of an agent confined to position 1 with no in/out-link: we use it to 
homogeneize the test polynomials, so that every monomial has degree exactly d. Given 
x = (xi, . . . , x n ) 6 W 1 , we define the monomial yfci,...,fc d = TIi=i x h (1 < k±, . . . , &<j < n) 
and, listing them in lexicographic order, form y = (ykx,...,k&) £ ^ j where N = n d ; note 
that y lies on a (real) algebraic variety V smoothly parametrized injectively by x. The 
map x i y /(x) induces the lifted map y h-> g(y), where g(y) = P(x)® d y and 

d 

P(x) 0d = P(x) <g> • • • <g> P(x) . 

Being the Kronecker product of stochastic matrices, P(x.)® d is stochastic: its diagonal 
is positive and its nonzero entries all exceed p d . Its associated graph, whose edges map 
out its nonzero entries, is the tensor graph product C?(x)® d . We use the term ground 
agents to refer to the n agents positioned at x. Including all the test polynomials from 
all the ground agents' decision trees gives us as many hyperplanes in M. N and the sign 
conditions of a cell c specify a unique stochastic matrix Q c . This matrix is always a 
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tensor power p® d but it is guaranteed to be of the form P(x)® d only if c contains a 
point y of V parametrized by x. 

Whereas a random shift produces affine forms a\y\ + • • • + clnUn + <5> the agreement 
rule acts in a more subtle way. While the whole point of the lifting is to forget about the 
variety V, the tensor structure of the matrices Q c brings benefits we can exploit. Given 
K C {1, . . . , n}, the cluster Ck refers to the subset of |-ftT| d agents with labels in K d . If 
all the agents of a cluster fit within a tiny interval then so do their ground agents; to see 
why, just expand (xi — Xj) d . By the agreement rule, therefore, the induced subgraph of 
the cluster cannot change until it is pulled apart by outside agents. Assume now that 
d > 1. We write 

x = (*£l,I> • • • j x l,di ■ • • > 2-71,1; • • • > %n,d)i 

with the homogeneizing agent 1 permanently positioned at (xi i, • • ■ ,x\,d) = 3-d- Next, 
we define y = (yi, . . . ,yn), where N = (dn) d and y*. = n«=i x ki,ji with I denoting the 
lexicographic rank of the string (k\, ji, . . . , k<x, jo) for k{ G {1, . . . , n} and ji G {1, . . . , d}. 
The matrix Q c associated with cell c is of the form (P ® Id)® d ; furthermore, P = -P(x) 
whenever y satisfies the N conditions y/, = nf=i x ki,j t for some x G M. dn . The cluster 
Ck consists now of (d|i^|) d agents. 



Nonconvergent HK systems. Heterogeneous HK systems 26 , 27 are influence sys- 
tems where each agent i is associated with a confidence value rj and the communication 
graph links i to any agent j such that \xi — Xj\ < r%. We design a periodic 5-agent 
system with period 2. We start with a 2- agent system with r\ = r2 = 2. Instead of 
uniform averaging, we decrease the self-confidence weight so that, if the two agents are 
linked, then x\ \- > \{x\ + 2x^) and X2 i-> |(2»i + X2). Any self-confidence weight less 
than 0.5 would work, too; of course, a weight of zero makes the problem trivial and 
uninteresting. If the agents are initially positioned at —1 and 1, they oscillate around 
with Xi = (— l)' l+t 3~* at time t. Now place a copy of this system with its center at 
X = 2 and a mirror-image copy at X = —2; then place a fifth agent at and link 
it to any agent at distance at most 2. As indicated in Fig. [TJ even though the agents 
themselves converge, their communication graph does not. 
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Figure 1: The communication graph of the HK system alternates between these two configura- 
tions. 
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3 An Overview of the Proof 



Given the challenge of presenting the multiple threads of the argument in digestible 
form, we begin with a bird's eye view of the proof. The standard way to establish the 
convergence of an algorithm or a dynamical system is to focus on a single unknown input 
and track the rate at which the system expends a certain resource on its way toward 
equilibrium: a potential function in algorithms; a free energy in statistical physics; or 
a Lyapunov function in dynamics. This approach cannot work here. Instead, we need 
to study the system's action on all inputs at once. This is probably the single most 
important feature distinguishing natural algorithms from the classical kind: because 
they run forever, qualitative statements about their behavior will sometimes require a 
global view of the algorithm's actions with respect to all of its inputs. For this, we need 
a language that allows us to model the evolution of phase space as a single geometric 
object. This is our next topic. As explained earlier, we may assume that d = 1. 

The coding tree. This infinite rooted tree encodes into one geometric object the set 
of all orbits and the full symbolic dynamics. It is the system's "Rosetta stone," from 
which everything of interest can be read off. The coding tree T is embedded in Q n x R, 



where O = (0, 1) and the last dimension represents time 10 Each child v of the root is 
associated with an atom U v , while the root itself stands for the phase space f2 n . The 
phase tube (U v , V v ) of each child v is the "time cylinder" whose cross-sections at times 
and 1 are U v and V v = f(U v ), respectively. In general, a phase tube is a discontinuity- 
avoiding sequence of iterated images of a given cell in phase space. The tree is built 
recursively by subdividing V v into the cells c formed by its intersection with the atoms, 
and attaching a new child w for each c: we set V w = /(c) and U w = U V H f~ tv (c), where 
t v is the depth of v (Fig. [2]). The phase tube (U v , V v ) consists of all the cylinders whose 
cross-sections at t = 0, . . . , t v are, respectively, U v , f(U v ), . . . , f tv (U v ) = V v . Intuitively, 
T divides up the phase space into maximal regions over which the iterated map is linear. 

The coding tree has three structural parameters that require investigation. One 
of them is combinatorial. Label each node w of the tree by the unique atom that 
contains the cell c defined above. This allows us to interpret any path as a word of 
atom labels and define the language L(T) of all such words: the word-length growth of 
L(T) plays a central role, which we capture with the word-entropy (formal definitions 
below). The two other parameters are geometric: the thinning rate tells us how fast the 
tree's branches thin out; the attraction rate tells us how close to "periodic" the branches 
become. Whereas the latter concerns the behavior of single orbits, the thinning rate 
indicates how quickly a ball in the space of orbits contracts with time, or equivalently 
how quickly the distribution of agent positions loses entropy. 

How do we read periodicity off from the coding tree? Intuitively, one would expect 
that, at some time v called the nesting time, for every v of depth t v = v, there exists 
w at the same depth with V v C U w . In other words the bottom sections of the phase 



By convexity, we can restrict the phase space to Q r ' 
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Figure 2: Node w at depth t w = 2 in the coding tree, with its phase tube (U w , V w ) (curved for 
aesthetic reasons). The cell U w lies within a single atom whereas V w splits over two of them. 
Time points downwards. We have represented only two of the possibly many children of the 
root. 

tubes will, suitably permuted, fit snugly within the top sections. This is not always 
true, however, and to find necessary conditions for it necessitates a delicate bifurcation 
analysis. Fig. [3] suggests a visual rule-of-thumb to guide our intuition in distinguishing 
between chaos and periodicity: the set 1Z consists of the points in phase space where 
the map / is not continuous. 




Figure 3: Two scenarios for the iterated preimages of the SP discontinuities 1Z: the set 1Z U f~ x {1Z) U 
■ ■ ■ U f^ l (TZ) depicted on the left seems to spread everywhere in phase space so as to cover all of it 
eventually, a symptom of chaos; the set on the right tends to fall into clusters or escape outside of Q, n , 
a sign of periodicity. 

The algorithmic pipeline. We assemble the coding tree by glueing together smaller 
coding trees defined recursively. We entrust this task to the arborator, an algorithm 
expressed in a language for "lego-like" assembly. The arborator needs two (infinite) 
sets of parameters to do its job, the coupling times and the renormalization scales. To 
produce these numbers, we use the flow tracker, an algorithm that, in the bidirectional 
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case, works roughly like this: (i) declare agent 1 wet; (ii) any dry agent becomes wet as 
soon as it links to a wet one; (iii) if all agents ever become wet, dry them all and go 
back to (i). The instants tk at which wetness propagates constitute the coupling times; 
the renormalization scales are given by the number Wk of wet agents at time tk- The 
key idea is that, between two coupling times tk and tfc+i, the system breaks up into 
two subsystems with interaction between them going only in one direction: from wet to 
dryp] We denote by A(p —> q) an influence system that consists of two groups of size p 
and q, with none of the q agents ever linking up to any of the p agents. This allows us 
a recursive decomposition of the overall system: 



A(n -> 0) 



For k = 1,2, 



Run A(u>k — > n — Wk) and A(n — Wk — > 0) concur- 
rently between times tk and tk+i- 



This formulation is of interest only if we can bound tk+i — tk- This is done implicitly by 
recursively monitoring the long-term behavior of the two subsystems and inferring from 
it the possibility of further wetness propagation. The flow tracker is a syntactical device 
because it merely monitors the exchange of information among agents with no regard 
for what is done with it. By contrast, the arborator models the agents' interpretation 
of that information into a course of action. The arborator is assembled as a recursive 
arithmetic expression over four operations: ©, ©, absorb, and renorm (Fig. [4]). It comes 
with a dictionary that spells out the effect of each term on the coding tree's structural 
parameters. Here is a quick overview: 



• The direct sum © models the parallel execution of two independent subsystems. 
Think of two agents, Bob and Alice, interacting with each other in one corner of 
the room while Carol and David are chatting on the other side. The coding tree 
of the whole is the (pathwise) Cartesian product of both two-agent coding trees. 

• The direct product ® performs tree surgery. It calls upon another primitive, 
absorb, to prune the trees and prepare their phase tubes for "glueing." Imagine 
Alice suddenly turning to Carol and addressing her. The flow tracker records that 
the two groups, Bob-Alice and Carol-David, are no longer isolated. Since this 



11 This does not mean that the dynamics within the dry agents is not influenced by the wet ones: 
only that dry agents do not include wet ones in the averaging. The standout exception is the case of a 
metrical system, where the dry agents act entirely independently of the wet ones between tk and tk+i- 
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Figure 4: The two tensor products. 



might not have happened had Alice been at a slightly different location, the phase 
tube leading to this event may well split into two parts: one bearing witness to the 
new interaction; and the other extending the direct sum unchanged. By analogy 
with the addition of an absorbing state to a Markov chain, the first operation is 
called absorb!^ 

• The primitive renorm, so named for its kinship with the renormalization group of 
statistical physics, uses the renormalization scales to compress subtrees into single 
nodes so as to produce (nonuniform) time rescaling. 



Attraction and chaos. The occurrence of chaos is mediated by the tension between 
two forces: dissipation causes the phase tubes to become thinner, which favors peri- 
odicity; phase tube splitting produces a form of expansion conducive to chaos. Two 
arbitrarily close orbits can indeed diverge wildly once they fall on both sides of a dis- 
continuity. The phase tubes snake around phase space while getting thinner at an 
exponential rate, so hitting SP discontinuities should become increasinly rare over time. 
The problem is that branching multiplies the chances of hitting discontinuities. For dis- 
sipation to overcome branching, the average node degree should be small. To show this 
is indeed the case requires a fairly technical rank argument about the linear constraints 
implied by the splitting of a phase tube. 

The thinning rate is about contraction, not attraction. To see why, consider a triv- 
ial system with only self-loops: it is stuck at a fixed point, yet the agents' marginal 



distributions suffer no loss of entropy The information-theoretic interpretation of 



thinning is illuminating. As agents are attracted to a limit cycle, they lose memory of 



12 The dynamics multiplies transition matrices to the left. Looking at it dually, the rightward products 
model a random walk over a time-varying graph. The operation absorb involves adding a new leaf, 
which is similar to adding an absorbing state; the direct product glues the root of another coding tree 
at that leaf. 

13 This is not to be confused with the word-entropy or the topological entropy. 



12 



influence 
system 



coding tree 



dictionary 



flow tracker 



coupling times 
renormalization scales 



parameters t=> 



periodic 
or chaotic 



Figure 5: Given the specification of the natural algorithm, the flow tracker computes the 
coupling times and the renormalization scales, which are needed by the arborator to assemble 
the coding tree. A dictionary allows us to bound the coding tree's structural parameters by 
examining the arborator one component at a time. Renormalization makes this a recursive 
process, hence the loop between the arborator and the parameters box. 



where they came from, something that would not happen in a chaotic system. Paradox- 
ically, interaction can then act as a memory recovery device and thus delay the onset of 
periodicity. 

Say the group Alice-Bob-Carol is isolated from David, until the latter decides to 
interact with Alice, thus taking in a fixed fraction of her entropy. Fast-forward. Alice 
is now caught in a limit cycle with Bob and Carol, while David has yet to interact with 
anyone since his earlier contact with Alice. His isolation means that he has had no 
chance to shed any of Alice's entropy. Although later caught in a periodic orbit, Alice 
might still be subject to tiny fluctuations, leading to a sudden interaction with David. 
When this happens, she will recapture part of the entropy she had lost: she will recover 
her memory! Happy as the news might be to her, this only delays the inevitable, which 
is being caught yet again in a limit cycle. Memory recovery cannot recur forever because 
David loses some of his own memory every time. In the end, because of dissipation, all 
the agents' memory will be lost. 



4 Algorithmic Dynamics 

We flesh out the ideas above, beginning with a simple local characterization of periodic- 
ity. We then proceed to define the coding tree (§ |4.2[ ), the arborator ( |4.3[ ), and the flow 
tracker (§|44|). 
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4.1 Conditions for asymptotic periodicity 



It is convenient to thicken the discontinuities. This does not change the dynamics of the 
system and is used only as an analytical device. Fix a small parameter e > once and 
for all, and, for any t > 0, define the margin 1Z £ , where 

K £ = [J | x = (xi, . . . ,x n ) G R n : | a + a\X\ H V a n x n + 6\ < e }, (1) 

SP 

where the union extends over all the SP discontinuities. The margin is made of 
closed slabs of width at least en~ oi < l \ It is useful to classify the initial states by how 
long it takes their orbits to hit 1Z £ , if ever. With f° = Id and min0 = oo, we define the 
label of x G £l n as 

£(x) = min 1 1 > | /*(x) G 1Z £ |. 

The point x is said to vanish at time £{x) if its label is finite. As we shall see, the analysis 
needs to focus only on the nonvanishing points. Write St = { x G Q n = (0, l) n | £(x) > t } 
for the set of points that do not vanish before time t: So is Q n ; and, for t > 0, 

t-i 

s t = fr\(J/- fc (ft £ ). 

k=0 

Each of its connected components is specified by a set of strict linear inequalities in M n , 
so St is a union of disjoint open n-cells, whose number we denote by #St- We redefine 
an atom to be a cell of S\ and restrict the domain of / to these new atoms. Each cell of 
<Si+i lies within a cell of St- The limit set Soo = P| t>0 5f collects the points that never 
vanish. Unlike those of St, its cells may not be open or full-dimensional. 



Periodic sofic shifts. Any cell c of S^ C S\ lies within a single atom, so we can 
define /i c as the linear map corresponding to the transition matrix P c . Since Soo is 
an invariant set, the image f(c) must, by continuity, lie entirely within a cell of S^. 
Suppose that #5oo < oo, a fact we will prove shortly. We define a directed graph F, 
with each node labeled by a cell c of S^ and with an edge (c, c'), labeled by /i c , joining 
c to the unique cell d of S^ that contains f(c). The system forms a sofic shift (ie, a 
regular language over the edge labels). Furthermore, F is functional, meaning that each 
node has exactly one outgoing edge (possibly a self-loop), so any infinite path ends up 
in a cycle. The trajectory of a point x is the string s(x) = cqCi • • • of atoms that it 
visits: /*(x) G q for all < t < £(x). It is infinite if and only if x does not vanish, 
so all infinite trajectories are eventually periodic. The weakness of this result is that 
it might be a statement about the empty set. To strengthen it, we declare the system 
to be nesting at t if no cell c of St contains more than one cell of St+\. (This does not 
mean that f(c) lies inside an atom.) The minimum value of t is called the nesting time 
v of the system. Observe that f^S v > ffSt, for any t > v. We bound the nesting time 
and then proceed with an alternative characterization of nesting. 
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Lemma 4.1. Both the nesting time v and the number of cells in St are bounded by 
(n/ £ )0{n) ! fa t = 0,1,..., oo. 

Proof. We begin with the second claim. If, in 0, we replace x by Px, for a stochastic 
matrix P, the coefficients of the affine form remain polynomially bounded, so the cells 
of St are separated from one another by slabs of thickness at least en~ 0<yl \ A simple 
volume argument implies an upper bound of (n/e)°( n ) on the number of such cells. To 
bound the nesting time, consider this procedure: suppose that we have placed a special 
point (called a witness) in each cell c of St- If c contains only one cell of St+i, we move its 
witness to that unique cell; if it contains more than one cell, then we move the witness to 
one of them and create new witnesses to supply the others; if c contains no cell of St+i, 
we leave its witness in place. We carry out this process for t = 0, 1, . . ., beginning with 
a single witness in So. Witnesses may move around but never disappear; furthermore, 
by the previous argument, any two of them are separated by at least en~°^ l \ so their 
number is bounded by {n/e) 0<K ' n \ Any time t at which the system fails to be nesting 
sees the creation of at least one new witness, and the first claim follows. □ 

Lemma 4.2. Given any cell c of St and k < t, the function /^ is linear. Given any cell 
b C £l n and any linear function g, if g(b) \ 1Z £ is connected then so is b \ ^ 1 (7^. £ ) . 

Proof. To call linear is to say that f k is described by a single stochastic matrix over all 

of c. We may assume that t > 0. Given a cell c C St, none of the cells c, /(c), . . . , /* _1 (c) 
intersect 1Z £ , hence each one falls squarely within a single atom and is linear for any 
k < t. For the second claim, note that, if the cell b intersects more than one connected 
component of O n \ g~ 1 (TZ £ ), then it contains a segment pq and a point r 6 pq such that 
g maps p and q outside of 1Z £ and r inside of it. By linearity, g(r) lies on the segment 
g(p)g{q); therefore g{b) \TZ £ is nonconvex hence disconnected. □ 



Lemma 4.3. The nesting time v is the minimum t such that f t (c)\7Z e is connected for 
each cell c of St; as a corollary, if c is a cell of S v , then /(c) intersects at most one cell 
of S v . 



Proof. The claims are trivial if v = 0, so assume that v > 0. For the first claim, it 
suffices to show that the system is nesting at time t > if and only if /*(c) \ 1Z £ is 
connected for each cell c of St- For the "only" part, we show why / (c) \ 1Z £ must be 
connected. By Lemma 4.2, /,* c is linear; therefore, since d = c\ f~ t {lZ £ ) is connected so 



is /*(c') = /*(c) \ 1Z £ . Conversely, assuming that each set /*(c) \ 1Z £ is connected, then 

with /,* (in its linear extension) and conclude 



4.2 



we identify the function g in Lemma 

that c\g~ l (lZ £ ) = c' is connected, hence constitutes the sole cell of St+i lying within c. 
To prove the corollary, again we turn to Lemma 4.2 to observe that /| c , . . . ,fY are all 



linear, hence so is g = fK , for b = /(c). Our new characterization of nesting implies 
that f v (c) \ K e = g{b) \ TZ £ is connected, hence so is b \ g' 1 ^) = /(c) \ f l - v (JZ £ ). 
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Since /(c) lies entirely within a cell of S v —i, the labels of its points are all at least u — 1. 
Removing from /(c) the points of label v — 1 leaves the connected set /(c) \ f 1 ~ u {JZ £ ); 
therefore, /(c) can intersect at most one cell of S u . □ 

We define the directed graph F with one node per cell c of S u and an edge from c 
to c', where c' is the unique cell of <S„, if it exists, that intersects /(c). Every trajectory 
corresponds to a directed path in F. The main difference with the previous graph is 
that the converse is not true. Not only a node may lack an outgoing edge but, worse, 
nothing in this framework keeps an orbit from going around a cycle for a while only 
to vanish later. The previous lemma's failure to ensure that /(c) lies strictly within 
another cell of S u puts periodicity in jeopardy. Perturbation is meant to get around 
that difficulty. Periods and preperiods are defined with respect to the paths of F, not 
trajectories: since the correspondence from paths to trajectories is not injective, the 
latter may have shorter periods. 

Lemma 4.4. The system is nesting at v and any time thereafter. Any nonvanishing 
orbit is eventually periodic and the sum of its period and preperiod is bounded by #S U . 



The attraction rate. Assume that v > and let c be a cell of S v . Identifying the 
nodes of F with their cells in S u , we denote by 00,01,... the path from c = 00. Let 
j be the smallest index such that Oi = o~j for some i < j. This defines the period 
P = p( c ) = j — i and the preperiod q = q(c) = i, with p + q < #S U . Given any x G c, 
its trajectory s(x) = coci • • • Q( x )_i is such that c^ is the atom containing the cell o>. 
Furthermore, for any q <t < £(x), 

/*(x)=M t _ 9(modp) QL(*- 9 )/pJ /9(x)j (2 ) 

where Mj~ = P Cq+k _ 1 • • • P Cq , for k = 0, . . . ,p — 1, and Q = M p , with Mq the identity 
matrix 



Because of the self-loops in the communication graphs, the powers of Q are 



known to converge to a matrix Q 49 . Given c and t > 0, we define 



II t — M 4 _ 9 ( modp ) Q P Cq _ x ■ ■ ■ P C q- 

The approximation IT is one of p matrices obtained by substituting Q for as many 
"chunks" Q = M p we can extract from the matrix product P Ct _ 1 ■ ■ ■ P co that defines f\ c . 
Note that this includes the case t < p + q, where no such chunk is to be found. Given 
any real a > 0, we define the attraction rate a as the maximum value of 9 a (c), over all 
cells c of S v , where 



6 a (c) = min j 6 > : ||/*(x) - n i(modp) x||oo < a , 

for all x € c and 6 <t< £(x) j. 



(3) 



Note that /' I (x) = P c „_i • ■ • -Pc x, with the matrix denoting the identity if q = 0. 



16 



Suppose that Q can be written as 



and assume the existence of a limit matrix B such that ||-B* — -B||max < e~ 7 *, for some 
7 > 0. We tie the attraction rate to the maximum row-sum in A, which is itself related 
to the thinning rate (whose formal definition we postpone). 



Lemma 4.5. Given Q as in (|4j) and an upper bound fi on [|Al||oo such that e 7 < \i < 1, 
for any < a < 1 — \i, 

«- = °(r^)"*2- 

Proof. For any t > 0, 



B l 

The matrix ^4 is strictly substochastic (/i < 1), so, by standard properties of a Markov 
chain's fundamental matrix, J2 k>0 A k = (I — A)^ 1 ; therefore, for t > 0, 

t-l t-i t-1 

C t -(I- A*)(I - AY X CB = A t - k - 1 CB k - ^ A k CB = ^ A^^CDk, 

k=0 k=0 k=0 

where D k = B k — B. Since C is substochastic, ||CT>fc|| max < e~ lk . From \\A k l]^ < fi k , 
we derive 

WA'-^CDklU, < /U *- fc - 1 e- 7fc . 
Since /i > e" 7 , it follows that \\C t - (I - A l ){I - A) _1 C-B|| max < tyt* -1 ; hence, 

Q 

where, by P'|| max < ^ and ||(J- A) _1 || max < 1/(1-//), ||Q*-Q|| max = Oitn^ 1 /(!■ 



(I-A^OB 
B 



fj,)). As a result, by Lemma 4.4 # a < q + pt < (jf=S v )t, if t satisfies tfj, < ct(l — fj,)n , 



for a large enough constant b > 0. □ 



This next result argues that, although a vanishing point may take arbitrarily long to 
do so, it comes close to vanishing fairly early. This gives us a useful analytical device 
to avoid summing complicated series when estimating the probability that a point will 
eventually vanish under random margin perturbation. 

Lemma 4.6. Given any finitely-labeled point x in a cell c of S v , there exists t < 6 a + 
p(c) + q(c) such that /*(x) G "R-2e, f or some a > en'°^ . 
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Proof. We can obviously assume that £(x) > 6 a + p(c) + q(c). For any t such that 
a <t< £(x), /*(x) lies in an ball of radius ol centered at n^modp(c) This means 
that, between times 9 a + q(c) and £{x), the orbit of x lies entirely in the union of p(c) 
balls of radius a and, by periodicity, each ball is visited before time 6 a + p(c) + q(c). 
Since x vanishes at time ^(x), one of these p(c) balls must intersect lZ e . Thickening all 
the margin slabs by a width of 4ay/n is enough to cover that ball entirely. If a = en~ b 
for a large enough constant b, replacing e by 2e achieves the required thickening. □ 

Although nesting occurs within finite time, the strict inclusion Sk+i C Sj~ may occur 
infinitely often. We show why: 



Example |4.1 1 Vanishing can take arbitrarily long. Consider the two- agent influence 
system 

/ v l / 2 l\ f x i 



K x 2 J 3 \1 2 J \x 2/ 
with the SP discontinuities formed by the single slab 

R E = {xel 3 : \xi - l + 5\ < e}. 

For simplicity, assume the same linear map / in the two atoms. It follows that 

xi\ S\ i A + 3-' l-3-*\ (x x 
x 2 J s \i _ 3 -i ! + 3 -tJ y x2 

The set is the complement within il 2 (the effective phase space) of 

oo 

|J { I (1 + 3- f )X! + (1 - 3r*)X 2 - 2 + | < 2e }. 
t=o 

Note that if e = 0, the number of cells in Soo is infinite: they are defined by 
an infinite number of lines passing through (1 — 6,1 — 5), with increasing slopes 
tending to —1. As soon as we allow thickness e > 0, however, the margin creates 
only 0(|loge|) cells. Not all of them are open. To see this, consider the point 
2(1 — 6 + e,0). It never vanishes yet any neighborhood contains points that do. 
Some points take arbitrarily long to vanish. Thickening the SP discontinuities into 
slabs is a "Unitizing" device meant to keep the number of co-labeled cells bounded. 



4.2 The coding tree 

The richly decorated tree T encodes the branching structure of the sets as a geometric 
object in "phase space x time" = O n x [0,oo). Recall that each atom c comes with its 
own transition matrix P c . Unless specified otherwise, a fixed perturbation value of 5 
is assumed once and for all. Think of U v and V v as the end-sections at times and 
t v of a phase tube containing all the orbits originating from U v . At time t v , the SP 
discontinuities might split the tube. This happens only if V v intersects the margin, 
which is the "branching condition" in the boxed algorithm. That intersection indicates 
the vanishing time of some points in U v , so we place a leaf as an indicator, and call it a 
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vanishing node. Whereas U v is an open n-cell, V v can be a cell of any dimension; hence 
so can be the connected components of V v \ lZ e . For each one, c, we attach a new child 
w to v and denote by P w the matrix of the map's restriction to c. The image of c at 
time t v , ie, P w c, forms the end-section V w of a new phase tube from the root, whose 
starting section U w is the portion of U v mapping to c at time t v (Fig. ul)fj 



Building T 

[1] The root v has depth f„ = 0; set U v <^V V <- fl n . 
[2] Repeat forever: 

[2.1] For each newly created node v: 

• If V v PI 7?. e 7^ [ branching condition ] 
then create a leaf and make it a child of v. 

• For each cell c of V v \ TZ e , create a child w of w and 

set P w <- P c ; V w <- P w c; U W ^U V C\ /"*« (c). 



Let ww'w" ■ ■ ■ denote the upward, t^-node path from w to the root (but excluding the 
root). Using the notation P< w = P w P w iP w ii • • • , we have the identity V w = P< w U w . No 
point in U w vanishes before time t w , and, in fact, Sk = [j w { U w \t w = k }. The points of 
Soo are precisely those whose orbits follow an infinite path Voo = vo,v±,V2, ■ ■ ■ down the 
coding tree. Each such path has its own limit cell U Voa = (~) t > U vt : collectively, these 



form the cells of 5oo- Example 4.1 features two infinite paths each of whose nodes has 
two children, one vanishing and one not. 

• The nesting time v = v(T) is the minimum depth at which any node has at most 



one nonvanishing child (Lemma 4.3); visually, below depth u, the tree consists of 
single paths, some finite, others infinite, with vanishing leaves hanging off some of 
them. A node v is deep if t v > v and shallow otherwise. 



The word-entropy h(T) is the logarithm of the number of shallow nodes 16 As we 
observed, S u = \J V { U v | t v = v }; therefore #S U < 2 h( - T \ 

The period p(T) is the maximum value of p(c) for all cells c = U v , with t v = v. 
The attraction rate 6 a (T) is the maximum value of the attraction rate for any 
such c. 



Note that U w cannot be defined as the portion of U v mapping to V w at time t w : the orbits must 
pass through c. 

16 The trajectories form a language L(T) over the alphabet of atom labels. Its growth rate plays a 
key role in the analysis and is bounded via the word-entropy. 
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The global coding tree. Let I denote the interval [—1,1]. Since not all perturba- 
tions 5 are equally good, we must understand how the coding tree T varies as a function 
of 5. To do that, a global approach is necessary: given A C I, we encode the coding 
trees for all 5 £ A into a single one, T A , which can be viewed as the standard coding 
tree for the augmented (n + l)-dimensional system (x, 8) \— > (/(x),<5), with the phase 
space O n x A. The sets U v and V v are now cells in M n+1 . In the branching condition, 
one should replace the margin 1Z £: as defined in Q, by the global margin: 

(j{(x,<T) el" +1 : | a + aixi + • • • + a n x n + 5\ <e}. (5) 
SP 

The degree of any node is bounded by , which is the maximum number of cells in 
an arrangement of hyperplanes in M n+1 . The definition of nesting can be extended, 
unchanged, to this lifted system. Since a standard coding tree is just a "cross-section" 
of the global one, nesting in 7~ even for all 5 does not imply nesting in 7~ A |^1 The global 
word-entropy h(T A ) is defined in the obvious way. 

4.3 The arborator 

This algorithm assembles the coding tree by glueing smaller pieces together. It relies 
on a few primitives that we now describe. The direct sum and direct product are 
tensor-like operations used to attach coding trees together. The primitives absorb 
and renorm respectively prune and compress trees. We present these operations and 
assemble the dictionary that allows us to bound the coding tree's parameters as we 
parse the arborator. 

Direct sum. The coding tree T = 71 ® 7i models two independent systems of size 
n\ and n<i- Independence means that the systems are decoupled (no edge joins agents 
from distinct groups) and oblivious (no SP discontinuity has nonzero coefficients from 
both groups): this implies that the two systems can be analyzed separately; decoupling 
alone is not sufficient. The phase space of the direct sum is of dimension n = n\ + n%. 
A path u>o, wi, . . . of T is a pairing of paths in the constituent trees: the node wt is of 
the form (ut,v t ), where ut (resp. vt) is a node of 71 (resp. 7i) at depth t; it is a leaf 
if and only if ut or v% is one — the vanishing of one group implies the vanishing of the 
whole. If w = (-u, v) is not a leaf, then U w = U u x U v , and V w = V u x V v . The direct sum 
is commutative and associative. The name comes from the fact that P w is the direct 
matrix sum of P u and P v : 

P W = P U ®P V = °J . 



17 Just as a region in the (X, Y)-plane need not be connected simply because all of its horizontal 
cross-sections are. 
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• Nesting time, period, and attraction rate. 

v(T) < max u (%) an d p(T) < 1 | p(%) and 9 a (T) < max 6 a (Ti). (6) 

8=1,2 i=l,2 

i=l,2 

The first two inequalities are obvious, so we focus on the last one. Consider a cell 
c = c\ x C2 of 5„ and follow the path of F emanating from it: this navigation 
corresponds to the parallel traversal of a path in F{ from Cj — we use the subscript 
i = 1, 2 to refer to either one of the subsystems. Assume without loss of generality 
that qi > c/2- By definition, to revisit an earlier node means doing likewise in each 
traversal; hence q > q\. At time qi, however, both parallel traversals are already 
engaged in their own respective cycles, so the node pair at time q\ will be revisited 
\cm. (p\,p 2 ) steps later, the time span that constitutes the period p of the direct 
sum; it also follows that q = q\. If q\ > qi, the traversals do not enter their cycles 
at the same time, so in general, referring to Q, the matrix Q is not the direct 
sum of Q p / Pl and Q P 2 ,P2 but, rather, of a shifted version Q = Q p/pi (BA)p/p 2 , 
where Q2 = AB. We easily verify that 

Q = lim Q k = Q X ® (BQ 2 A), 

k— >oo 

where, as before, Qi = lim^oo Q\. The use of the norm allows us to verify 
the bound on the attraction rate of the direct sum by checking the accuracy of the 
approximation for each subsystem separately. It suffices to focus on the case of 
72, which presents the added difficulty that the approximation scheme delays the 
cycle entrance until q\ . The other difference with the approximation scheme in the 
original system 7i is that, since the period can be much longer, so can the sequence 
(M2) In all cases, however, the approximation scheme in T as it applies to T2 
differs from the scheme in 7% in only one substantive way. Consider the language 
consisting of the words (AB)*, (AB)*A, (BA)*, and (BA)*B. One approximation 
scheme involves replacing any number of "AB"s by Q2, while the other scheme 
replaces any number of u BA"s by BQ2A. Because ABQ2 = Q2AB = Q2, any 
application of one scheme or the other produces the same matrix. 

• Word-entropy. We prove (quasi) subadditivity. Assume without loss of generality 
that v(Ti) > v(J~i)- The word-entropy counts the number of shallow nodes w = 
(u,v). This implies that t u < v(7i), which limits the number of such nodes u to 
2 h( > Tl \ If all the nodes v were shallow in 72, the subadditivity of word-entropy 
would be immediate; but it need not be the case. If v is deep, let s(v) be its 
deepest shallow ancestor. The function s may not be injective but it is at most 
two-to-one. Thus, 

h(T) < h(Ti) + h(T 2 ) + 1. (7) 

All of the relations in ([6j[7]) still hold when the superscript A is added to the coding trees. 
We discuss Q to illustrate the underlying principle. First, we provide an independent 
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perturbation variable £j € A to Ti (i = 1, 2) and add it as an extra coordinate to the 
state vector, thus lifting system i to dimension rtj + 1. By ([7]), the word-entropy of the 
joint system T* in dimension n+2 is at most /i(T 1 A ) + /i(72 A ) + l. Second, we restrict the 
system T* to the invariant hyperplane 5± = 52, which cannot increase the word-entropy, 
hence /i(T A ) < h(T*), as claimed. 

Absorption. The direct product, which we define below, requires an intermediate 
construction. The goal is to allow the selection of nodes for removal, with an eye toward 
replacing the subtrees they root by coding trees with different characteristics. The 
selection is carried out by an operation called absorb(T), which replaces any deleted 
node by a leaf. For reasons that the flow tracker will soon clarify, such leaves are 
designated wet. An orbit that lands into one of these wet leaves is suddenly governed 
by a different dynamics, modeled by a different coding tree, so from the perspective of 
T alone, wet leaves are where orbits come to a halt. While vanishing leaves signal the 
termination of an orbit (at least from the perspective of the analysis), the wet variety 
merely indicates a change of dynamics. Here is a simple illustration: 

Example |4.3| The system consists of two independent subsystems. Suppose we 
add a union of slabs, denoted by TZ' £ , to the original margin 7Z°, thus breaking the 
direct-sum nature of the coding tree. In Fig. [6j lZ' e would consist of the two infinite 
strips bordering b. We keep the transition matrices unchanged everywhere except 
in cell b, which we call wet: all transition matrices are still direct (matrix) sums, 
with the possible exception of Pi,. Suppose we had available the coding tree prior 
to the margin's augmentation. Let V v denote the pentagon in the figure and w be 
the node associated with the trapezoid c that holds a, b, d. We need to replace w by 
three nodes: two of them for a, d and one, a wet leaf, for b. Fhc transition matrices 
for a, d are both equal to the direct sum P c , while Pb can be arbitrary. The idea is 
that b can then be made the region U TOOt of a new coding tree. 

Minor technicality: usually, U roo t = Cl n , so the coding tree must be cropped by 
substituting b for fl n ; note that b need not be an invariant set. Cropping might 
involve pruning the tree but it cannot increase any of the key parameters, such as 
the nesting time, the attraction rate, and the period. Absorption appeals to the fact 
that we can ignore b and its wet leaf until we have fully analyzed the direct sum. 
This separation is very useful, especially since absorption does not require a direct 
sum — we never used the fact that the old slabs were horizontal or vertical — and is 
therefore extremely general. 



A crucial observation is that the nodes z created for the subcells d of a given c (subcells 
a and d in Fig. [6]) have the same matrix P c . As a result of all the absorptions, the tube 
(U V ,V V ) is split up by up to t v linearly transformed copies of the margin slabs, 

hence into at most t^n°^ subcells. This compares favorably with the naive upper 
bound of n°( ntvS) based on the sole fact that absorption at each ancestor of v produces 
n O(n) c ] 1 Q c i ren _ 
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Figure 6: The original cell c of V v (bottom-center trapezoid) splits up into the wet cell b and 
the dry cells a and d, both of which inherit the matrix P c . 



Absorption surgery 



[1] If v has no leaf, create a vanishing leaf and make it a child of v. 

[2] For each cell c of V v \ 1Z°, let w be the child of v for c (ie, such that 
/(c) = V w ) and let T be the tree rooted at w. If c n TZ' e , then remove T 
and, for each cell d of c \ lZ' e , create a node z and make it a child of v. 

• If d is wet, make z a wet leaf. 

• If d is dry, reattach to z a suitably cropped copy of T. 
Set P 2 <- P c , K <- P z c, and C/ z <- n /"*» (c'). 



Direct product. The tree T = T\®Ti models the concatenation of two systems. The 
direct product is associative but not commutative. It is always preceded by a round of 
absorptions at one or several nodes of 71- We begin with a few words of intuition. 
Consider two systems S\ and S2, governed by different dynamics yet evolving in the 
same phase space O n . Given an arbitrary region A C £l n , we define the hybrid system S 
with the dynamics of S2 over A and S\ elsewhere. Suppose we had complete knowledge 
of the coding tree T% for each S% (i = 1,2). Could we then combine them in some ways 
to assemble the coding tree T of SI To answer this question, we follow a three-step 
approach: 

• (i) we absorb the tree 71 by creating wet leaves w for all the nodes v with V v CiA 7^ 0; 

• (ii) we attach the roots of cropped copies of 75 at the wet leaves; and 

• (iii) we iterate and glue 71 and 75 in alternation, as orbits move back and forth in 
and out of A. 

Absorption, direct products, and the arborator address (i, ii, iii) in that order. The 
root of 75 is attached to w, but not until that tree itself has been properly cropped so 
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that f7 root = V root (T2) = V W (T\) = P w c, with P w given by 7i and not T\. To be fully 
rigorous, we should write a direct product as 71 <8> {72} since the trees Ti we attach to 
the wet nodes might not all be the same: the cropping might vary, as might the wet 
regions. 

• Nesting time and attraction rate. Bounding the nesting time of a direct product is 
not merely a combinatorial matter, as was the case for direct sums: the geometry of 
attraction plays a role. Even the case of absorb(Ti) demands some attention and 
this is where we begin. Adding 1Z' £ to the margin cannot create arbitrarily deep wet 
nodes: specifically, no v £ absorb(71) of depth at least max{ f(7i), aa (7i) + p} 
can have a wet child, where p = p(U v ) and a a = srT a for a large enough constant 
a. Indeed, suppose there is such a node v. Pick x G U v such that /*"(x) lies in a 
wet cell c within V v and observe that 

\\f U (x)-f v - p (x)\\oc < ||/ t "(x)-n tB(modj , ) x|| 0O +||/*«-P(x)-n t „ (niodp) x|| 00 < 2a a . 

By our choice of a a , this implies that /*"(x) and /*"~ p (x) are at a distance 
apart less than the width of the margin's slabs; therefore, f tv ~ p (x) lies in a wet 
cell or in a slab. It follows that the orbit of x either vanishes or comes to a (wet) 
halt at a time earlier than t v , so x ^ U v and we have a contradiction. It follows 
that all deep nodes of 7i deeper than 9 aa {T\) +p(J~i) are also deep in absorb(Ti). 
With T = T\ <8> 7i, therefore, 

z/(absorb(7I)) < max{ ^(71), aa (Ti) +p(71) } , for some a a > en^ ot ~ 1 ^ 
< v(T) < ^(absorb(Ti)) + v(T 2 ) 

e a (T) < max{ 9 a (Ti), ^(absorb(Ti)) + Q (T 2 ) } 

(8) 

• Word- entropy. Absorption can occur only at nodes v of depth t v < ^(absorb(7i)). 
This means that the number of nodes where wet cells can emerge is at most 
2 h ( T ~ 1 \8 ota (Ti) +p(7i)). As we argued earlier, each such node v can give birth to at 
most t™n°^ new nodes, so the number of shallow nodes in T is (conservatively) 
at most 

#u with wet child #splits/t) #75 nodes 

2' l(Tl) (^ a (Ti) +p(Ti)) x V(Ti) + 6 aa (Ti) + p(Ti)) n n°W x 2^) . 

We use the fact that cropping cannot increase the word-entropy. Taking loga- 
rithms, we find that 

h(T) < h(Ti) + h(T 2 ) + (n + 1) log( u(Ti) + e aa (Ti) + p(Ti) ) + O(nlogn). (9) 

Since both ^(71) and p(7i) are no greater than 2 h ( T ' 1 \ we can simplify the bound: 

h(T) < (n + 2)h(T 1 ) + h(T 2 ) + (n + l)loge aa {T 1 ) + 0(n\ogn). (10) 

We repeat our earlier observation that, by viewing the perturbation variable S as an 
extra coordinate of the state vector, the relations above still hold for global coding trees 
with n incremented by one. 



24 



Renormalization. This operation is both the simplest and the most powerful in the 
arborator's toolkit: the simplest because all it does is compress time by folding together 
consecutive levels of 7"; the most powerful because it reaches beyond lego- like assembly to 
bring in the full power of algorithmic recursion into the analysis. The primitive renorm 
takes disjoint subtrees of T and regards them as nodes of the renormalized tree. This is 
done in the obvious way: if u is any node in T with two children i>i,t>2, each one with 
two children, v\\, v\2 and V21, f22> then compressing the subtree u, vi,V2 means replacing 
it by a node z with the same parent as u's (if any) and the four children u« . We discuss 
this process in more detail below. Although inspired by the renormalization group of 
statistical physics, our approach is more general. For one thing, the compressed subtrees 
may differ in size, resulting in nonuniform rescaling across T . This lack of uniformity 
rules out closed-form composition formulae for the nesting time, attraction rate, and 
word-entropy of renormalized coding trees, which must then be resolved algorithmically. 

4.4 The flow tracker 

We approach periodicity through the study of an important family, the block- directional 
influence systems, whose agents can be ordered so that 

o - £) , (id 

where denotes the (n— m)-by-m matrix whose entries are the constant function x 1— )■ 0; 
in other words, in a block-directional system, no -B-agent ever links to an A-agent. 
Suppose that m < n. Wet the -B-agents with water while keeping all the A-agents 
dry. Whenever an edge of the communication graph links a dry agent to a wet one, the 
former gets wet. Note how water flows in the reverse direction of the edges. As soon 
as all agents become wet (if ever), dry them but leave the -B-agents wet, and repeat 
forever. The case m = n is similar, with one agent designated wet once and for all. The 
sequence of times at which water spreads or drying occurs plays a key role in building 
the arborator. 

Coupling times and renormalization scales. Let T m ->n-m denote the coding tree 
of a block-directional system consisting of m (resp. n — m) j4-agents (resp. -B-agents). 
The arrow indicates that no B- agent can ever link to an A-agent: Qij is identically zero 
for any _B-agent i and ^4-agent j. We use the notation T m \\ n - m for the decoupled case: 
no edge ever joins the two groups in either direction, but the discontinuities may still 
mix variables from both groups. Note that the metrical case implies full independence 
(0, so that 

•m |j n—m = Tin © T n —m- 

Assume that n > 1 and < m < n. We write 7~ m ->o as T m - Likewise, we can always 
express T m ^n-m as T m , but doing so is less informative. When the initial state x is 
undersood, we use the shorthand Gt = <5(/*(x)) to designate the communication graph 
at time t and we denote by Wt the set of wet agents at that time. The flow tracker is not 
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concerned with information exchanges among the -B-agents: these are permanently wet 
and, should they not exist (m = m), agent 1 is kept wet at all times [2.1]. Thus the set 
Wt of wet agents is never empty. The assignments of to m step [2.3] divide the timeline 
into epochs, time intervals during which either all agents become wet or, failing that, the 
flow tracker comes to a halt (breaking out of the repeat loop at "stop" ) . Each epoch is 
itself divided into subintervals by the coupling times t% < ■ ■ ■ < tt, with Wt k C Wt k +i- 
The last coupling time ti marks either the end of the flow tracking (if not all ^-agents 
become get) or one less than the next value of to hi the loop. 

The notion of coupling is purely syntactical, being only a matter of information 
transfer. Our interest in it is semantic, however: as befits a dissipative system, a certain 
quantity, to which we shall soon return, can be bounded by a decreasing function of 
time. To get a handle on that quantity is the main purpose of the flow tracker. 

Flow tracking in action. Suppose that, for a long period of time, the wet agents fail 
to interact with any dry one. The two groups can then be handled recursively. While 
this alone will not tell us whether dry-wet interaction is to occur ever again, it will 
reveal enough fine-grained information about the groups' behavior to help us resolve 
that very question. Suppose that such interaction takes place, to be followed by another 
long period of interaction. Renormalization squeezes these "non-interactive" periods 
into single time units, thus providing virtual time scales over which information flows 
at a steady rate across the system. Thus, besides analyzing subsystems recursively, 
renormalization brings uniformity to the information transfer rate. 



Flow tracker 



[1] t ^0. 

[2] Repeat forever: 

[2.1] If to < n then W to «- {to + 1, . . . , n} else W to «- {1}. 
[2.2] For t = t Q ,t + 1,..., oo 

w t+1 <- w t u {* I a g G t & j € w t }■ 

[2.3] If | Woo | = n then t <- min{t > t ■ \W t \ — n} else stop. 



Example |4.4| The third column below lists a graph sequence Go, . . . , Gn in chrono- 
logical order, with the superscript w indicating the edges through which water prop- 
agates to dry nodes. The system is block-directional with three A-agents labeled 
a, b, c and one -B-agent labeled d. For clarity, we spell out the agents by writing the 
corresponding coding tree 7i->i as Tabc^d, instead, thus indicating that no edge 
may link d to any of a, b, c. 
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Flow tracking 


renorm 


Wo - {d} 

W! = {d} 

W 2 = {d} 


d a — >■ b — > c 
d a <— 6 — > c 
d a — > <— c 


7d || afcc 


t\ = 3 


W* = \d\ 

' ' o I J 


7 7 

a <s— a -s— o -s— c 


' abed 


renorm 


W 4 = {a,d} 
W 5 = {a,d} 


(i -s— a — > 6 — > c 
ii a — > — > c 


7~a — ► 6cei 


t 2 = 6 


W 6 = {a,d} 


7 10 7 

d <— a <— o <— c 


Tabcd 


renorm 


W 7 = {a,b,d} 
W s = {a,b,d} 
W 9 = {a,b,d} 


d a — > 6 — > c 
d <— a -s— 6 c 
<i 4— a — > b — > c 


Tab — > cd 


h = 10 


W 10 = {a,b,d} 


7 7 ^ 

d <— a ^ o c 


Tabcd 




W n = {a,b,c,d} 


a 6 c 


Td || abc 



In the first renormalized 3-step phase, the system "waits" for an edge from {a, b, c} 
to d, and so can be modeled as Td\\ abc- In the metrical case, this is further reducible 
to Td ffi Tabc- The times t\,ti, t% coincide with the growth of the wet set: these are 
one-step event, which are treated trivially as height-one absorbed trees. They entail 
no recursion, so inductive soundness is irrelevant and writing the uninformative 
Tabcd is harmless. The other renormalized phases are counterintuitive and should 
be discussed. Take the last one: it might be tempting to renormalize it as Tabd^c 
to indicate that the phase awaits the wetting of c (with a, b, d already wet). This 
strategy is inductively unsound, however, as it attempts to resolve a system Tabc^d 
by means of another one, T a bd^c, of the same combinatorial type. Instead, we use 
the fact that not only no edge can link c to {a, 6} (by definition of the current phase) 
but no edge can link d to {a, b} either (by block-directionality). This allows us to 
use Tabbed, instead, which is inductively sound. 

Renormalization, which is denotated by underlining, compresses into single time 
units all the time intervals during which wetness does not spread to dry agents. 
With the subscripts (resp. superscript) indicating the time compression rates (resp. 
tree height), the 11-node path of T a bc^d matching the graph sequence above can 
be expressed as 

Td\\abc ® T}l cd ® Tabbed | 2 ® T^ cd ® Tabbed {3 ® T^ cd . 



As the example above illustrates, the coupling time is immediately followed by a 
renormalization phase of the form T Wk ^ n -w k , where Wk = |Wt fc +i| — n + m is the 
renormalization scale (k = 1, ...,£— 1). Thus, any path of the coding tree can be 
renormalized as 

e-i 

Tm^ n - m ==> T m \\n-m ^ ® ® { (^) ( Tw k ^n-w k | t k+1 ~t k -l ® ) } ®T m ^n~m ■ 

k=l 

(12) 

The recursion comes in two forms: as calls to inductively smaller subsystems Tw k n ~ Wk , 
and as a rewriting rule, T m ^n-m =>•■■{} <8> Tm-*n-m- It is the latter that makes the 
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arborator, if expanded in full, an infinitely long expression. We note that all these 
derivations easily extend to the global coding trees. 



5 Bidirectional Systems 

We begin our proof of the bidirectional case of Theorem |1.1| by establishing a weaker 
result for metrical systems: recall that these make the presence of an edge between 
two agents a sole function of their distance. The proof is almost automatic and a good 
illustration of the algorithmic machinery we have put in place. By appealing to known 
results on the total s-energy, we are able to improve the bounds and extend them to the 
nonmetrical case. 



5.1 The metrical case 

It is worth noting that, even for this special case, perturbations are required for any 
uniform convergence rate to hold. 



Example |5.1| Consider the 3-agent system: 



xA i(2 l\ (x 1 

with xz i-> \{x2 + X3) if £3 — X2 > 1 and X3 1— > X3 else. Initialize the system with 
X2 = —x\ = 1 and £3 slightly bigger than X2- The edge joining agents 2 and 3 will 
then appear only after on the order of |log(a;3 — x-i)\ steps, which implies that the 
convergence time cannot be bounded uniformly without perturbation. 



Fix S in A = (n~ b I) \ (n~ 2b I), where I = [—1, 1] and b is a suitably large constant 18 
The margin slabs of a metrical system are of the form |cto+Xj— Xj + 5\ < e. Because ao is 
an 0(logn)-bit rational, as long as e < n _3b , x cannot lie in that slab if \xi — Xj\ < n~ 3b . 
Let diam(s) be the diameter of the system after the s-th epoch. From (14) in |17| , we 
conclude that water propagation to all the agents entails the shrinking of the system's 
diameter by at least a factor of 1 — n~°( n \ Since an epoch witnesses the wetting of all 
the agents, repeated applications of this principle yields 

diam{s) <e- sn ~° in) . (13) 

After n cn epochs have elapsed (if ever), for a large enough constant c, the diameter of 
the system falls beneath n _3b and, by convexity, never rises again. By our previous 
observation, the orbit can never hit a margin subsequently. The maximum time it takes 
for n cn epochs to elapse, over all x € £l n and 5 G A, is an upper bound on the nesting 
time of the global coding tree. Furthermore, past that time, the communication graph 
is frozen, meaning that it can never change again. 



18 Recall that ideally A should be {0} so the more confined around we can make it the better; thus 
a higher value of b is an asset, not a drawback. 
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Lemma 5.1. If P is the transition matrix associated with the undirected communication 
graph G, there is a matrix II such that \\P k — II|| max = e~ kn ° (n> ; for any k > 0. 

Proof. By repeating the following argument for each connected component if needed, we 
can assume that G is connected. The positive diagonal ensures that P is primitive (being 
the stochastic matrix of an irreducible, aperiodic Markov chain), hence P n , which we 
denote by M, is positive. Since each nonzero entry of P is at least n~ olyl \ the coefficient 
of ergodicity of M, defined as 



fji} 



j3 = \ max ^ \Mg - Mji\ = 1 - min ^ min{Mi/, M jt 
h] i 1,3 i 

satisfies f3 < l — n~°( n \ Two classic results from the theory of nonnegative matrices |49| 
hold that f3 is an upper bound on the second largest eigenvalue of M (in absolute value) 
and that j3 is submultiplicative{^] Given any probability distribution x, if y = M^x, 
then 

max \yi — y~\ < f3 l max \xi — xA < e~ ln . (14) 

By Perron-Frobenius and the ergodicity of P, its powers tend to the rank-one matrix 
lv T , where v is the dominant left-eigenvector of P with unit ^i-norm; furthermore, 

\\pk _ wl T|i 

ui u-i max — c 



Indeed, setting x to the j-th basis vector (0, . . . , 1, . . . , 0) T in (14) shows that the j-th 
column of M l = P , for I = [k/n\, consists of identical entries plus or minus a term 
in e~ ln 0<n) . By convexity, these near-identical entries cannot themselves oscillate as I 
grows. Indeed, besides (14), it is also true that [minyj, maxy^] C [minarj, maxsc,-]. □ 



The next step in deriving the coding tree's parameters is to specialize the arborator's 



expression ( 12 ) to the metrical case. The outer product enumerates the first epochs 
leading to the combinatorial (but not physical) "freezing" of the system. The coupling 
times and renormalization scales might vary from one epoch to the next; to satisfy the 
rewriting rule below, we set wo = 1 and to = — 1- The cropped coding tree 7^* models 
the post-freezing phase. 

T n { ® (8>( r Wk e r n - Wk ltL+i tfc _ 1 8) rj 1 ) (15) 

s=l k=0 

The following derivations entail little more than looking up the dictionary compiled 
in 



The stochastic matrix P may not correspond to a reversible Markov chain and might not be 
diagonalizable. It is primitive, however; therefore, by Perron-Frobenius, it has unique left and right unit 
eigenvectors associated with the dominant eigenvalue 1. 
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Nesting time and attraction rate. It is convenient to define 

Ma(T) = max{ v(T),O a (T) }■ 

If the coding trees Xi, . . . , Tf. have period one then, by Q, 

k 



- k + '^2iJ, aa (T i ) + max Ha(Ti). (16) 



1=1 



The coding tree T* invo lves a single matrix whose powers converge to a fixed 



matrix II and, by Lemma 5.1, ^ a {Tn) < log — . The following bounds derive 
from monotonicity and successive applications of ([6j [8]). For some suitable a a = 

£n — 0(l) an( J a < 



n-1 



VaiTn) < Ha a (TZ) + n° {n) max { Ma (7fc) , Ma Q (7^-jfc) } + max{ ^ a (Tfc) , [X a (T* ) } 



fc=l 



< /^(Tn-l) + Ma(Tn-l) + log i 

<n°( n2 )logi + n°Wlog^. 

(17) 



In view of this last upper bound, the condition a < a a can be relaxed to a < 1. 
Thus, 

v{T n ) < n° {n ' 2) log I and Q (T n ) < n (" 2 ) log \ + n°W log ± . (18) 

Word- entropy. By (|9]) and the attraction rate bound above, for < e < 1/2, 

h{T x ®T 2 ) < / l (T 1 ) + /i(r 2 ) + (n + l)log(2/i Qa (T 1 ) + l) + 0(nlogn) 

< /i(Ti) + /i(T 2 ) + (n+l)loglogi+0(n 3 logn). ' 

By and /i(7£) = 0, it follows that 

n O(n) n _l 

h(T n )< J2{h((T h ®T n - k )®Tjl 1 )}+h(7:) 
s=l k=l 

+ (n + 1) log(2/i Qa (T n *) + 1) + n°W log log | 
< n°( n) h(T n -i) + n° (n) log log \ < n°(™ 2 ) log log | . 

Our earlier observation that such derivations apply to the global coding trees tells 
us that 

h(T n A )< n^loglogi. (20) 



Note the crucial fact that, from the vantage point of (18, 20), the global word-entropy 



is lower than the nesting time, which shows that the coding tree's average node is less 
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than 2. By Lemmas 4.4 and 4.6, any vanishing point x hits an enlarged margin fairly 



early: / ta (x) G T^e for t a < 6 ao + #S V and some a Q > en therefore, 

t a <^ + 2^ A )<|log £ r° ( ' l2) . (21) 

For random 5 £ A, a fixed point lies in a given slab of 7^2e with probability at most 
Ae/(2n- h - 2n~ 2b ); by a union bound over the margin slabs, the probability of being in 
1Z2 £ does not exceed en°^ l \ Therefore, the probability that a fixed x ever vanishes is 
at most £n°W times the number of paths of depth at most t a in the global coding tree 



Tf, which, by (j21j), is 

iio g£ r° ( "V^ A ). 



By (20), this puts the vanishing probability at 



£T (log \T < y/e, 

for e small enough, which means that it can be set arbitrarily low. Removing a small 
interval in the middle of n~ b I to form A was only useful for the analysis: in practice, 
we might as well pick the random perturbation uniformly in n~ b I since it would add 
only 1n~ b to the error probability. The merit of the proof is that it is a straightforward, 
automatic application of the arborator's dictionary. It illustrates the power of renor- 
malization, which can be seen in the fact that no explicit bound on tk+i — tk is ever 
needed. By appealing to known results about the total s-energy (TtJ we can both extend 
and improve the bound on the convergence rate. 

5.2 The bidirectional case 

To give up the metrical assumption means that the presence of an edge in the commu- 
nication graph no longer depends on its two agents alone but possibly on all of them. In 
such a system, for example, two agents might be joined by an edge if and only if fewer 
than ten percent of them lie in between. We revisit the previous argument and show how 
to extend it to general bidirectional systems. We retain the ability of the communication 
graph to freeze when the agents' diameter becomes negligible by enforcing the agreement 
rule: Gij is constant over the slab \xi — Xj\ < n~ bn , for some suitably large constant b. 
The difficulty with nonmetrical dynamics is that, though decoupled, subsystems are no 



longer independent, so in (15) the direct sum T Wk ®T n -w k is no longer operative. 

We set A = n~ b I and fix x 6 Q n for the time being. This induces a length on each 
edge of any communication graph Q(f t (x.)), so we can call a node v of 7^ A heavy if its 
communication graph contains one or more edges of length at least n~ 2bn . The number 
of times the communication graph has at least one edge of length A or more is called 



the communication count C\: it has been shown, using the total s-energy 17 , that 
C\ < A^ 1 p~°( n \ where p is the smallest nonzero entry in the stochastic matrices; here 
p > n~°^\ It follows that, along any path of T n , the number of heavy nodes is n°^ n \ 
Let us follow one such path and let G k denote the communication graph common to the 
subpath between the k-th and (k + l)-st heavy nodes. To see why that graph is unique, 
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suppose two consecutive light-node graphs are different. Then some (i, j) is an edge of 
one but not the other. But, since the first graph only has edges of length less than n~ 2bn , 
the locations of both i and j cannot vary by more than n~ 2bn between the two graphs. 



It means that in both graphs their distance cannot exceed 3n" 



-2bn 



< n 



-bn. 



therefore, 



by the agreement rule, (i, j) is an edge in both graphs, which is a contradiction. We 



rewrite (12), for fixed x, as 



,0(n) 



• n 



k=l 



'\G k 



T 

' 71 



, A|1 ) } 



77; 



|G°° ' 



where G°° is the final graph, which forms an infinite suffix of the graph sequence 
C?(/*(x))i>(). We reduce unnecessary branching as follows: whenever V v (which, with 
x fixed, is an interval along the 5-axis) is split into two or more cells by the switching 
partition, we give it two or more children (besides vanishing leaves) only if at least one 
of these cells corresponds to a heavy node. The reasoning is that, in the absence of 
heavy nodes, splitting U v into subcells is pointless since the communication graphs of 
all the children are the same; so we might as well give v a single child and, if neec 
be, a vanishing leaf. This ensures that the nesting time of T^ k is 0. By Lemma 



a {T* k ) <n°Wlogi and, by (16), 



5.1 



n O(n) 



M7; A )< n°^+n aa {T\^)+Yl ^ a (7T A 0+max{/, Q (7T^),^(77^)}<n o Wlog^ 



k=l 



Since 6 a (T^) < log i and h(T£ h ) = 0, by (9), inequality (19) becomes 



h(Ti ® T 2 ) < h(Ti) + h(T 2 ) + (n + 1) log log 1 + 0(n 2 logn); 

therefore, h(l~J^) < n°W log log ~. Repeating the argument we used for the metrical 
case implies that the vanishing probability of x is at most 

for e < 2~ nC and constant c large enough. The attraction rate is at most n°^ n ^log^, 



for any a < s, and the proof of the bidirectional case of Theorem 1.1 is complete. □ 



6 General Influence Systems 

We prove Theorem |1.1| The centerpiece of our proof is the bifurcation analysis of a 
certain non-Markovian extension of an influence system. We focus on that extension first 
and then show how it relates to the original system. We impose a timeout mechanism to 
prevent any edge from reappearing after an absence of t Q consecutive steps, for arbitrarily 
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large t Q . Fix a directed graph H with n nodes labeled 1 through n. Given any x 6 f2 n , as 
soon as either the communication graph Q(f t (x)) contains an edge not in H or some edge 
of H fails to appear in at least one of <5(/*~*° +1 (x)), . . . ,£/(_f*(x)) for some t > t , set 
all future communication graphs to be the trivial graph consisting of n self- loops. This 
creates a new coding tree, still denoted T n for convenience, which has special switching 
leaves associated with the trivial communication graph. We show that, almost surely, 
the orbit of any point is attracted to a limit cycle or its path in the coding tree reaches 



a switching leaf 20 As in the bidirectional case, we assume the agreement rule, which 
sets Qij to a constant function over the thin slab \xi — Xj\ < n~ bn . 

What is HI Any infinite graph sequence such as <5(x), C/(/(x)), ^(/ 2 (x)), etc, defines 
a unique persistent graph, which consists of all the edges that appear infinitely often. 
The timeout mechanism allows an equivalent characterization, which includes the edges 
appearing at least once every t Q steps. The persistent graph depends on the initial state 
and is unknown ahead of time, so our analysis must handle all possible such graphs. 
While it plays a key role in the analysis, it would be wrong to think of the persistent 
graph as determining the dynamics: influence systems can be chaotic and nontrivially 
periodic, two behaviors that can never be found in systems based on a single graph. 

Consider the directed graph derived from H by identifying each strongly connected 
component with a single node. Let B\, . . . , B r be the components whose corresponding 
nodes are sinks and let ni denote the number of agents in the group Bf, write n = 
m + n\ + • • • + n r . (In Markov chain terminology, Bi is a closed communicating class.) 
The linear subspace spanned by the agents of each Bi is forward-invariant and, as we 
shall see, the phase space evolves toward a subspace of rank r. We reserve the indices 
1, . . . , m to denote the agents outside of the B^s. Unless they hit a vanishing or switching 
node, the agents indexed m + 1, . . . , n are expected to settle eventually, while the other 
agents orbit around them, being attracted to a limit cycle. We shall see that nontrivial 
periodicity is possible only if r > 1. We are left with a block-directional system with 
m (resp. n — m) A- agents (resp. B- agents), and the former exercising no influence 



on the latter (£ |4.4| ), It follows from (11) that, for each node v of the global coding 

tree 7m->n-mi 

p * = ( A r si) ■ < 22 > 

To resolve the system requires a fairly subtle bifurcation analysis which, for convenience, 



we break down into four stages: in §6.1| we bound the thinning rate; in £6.2 we argue 



that, deep enough in the coding tree, perturbations keep the coding tree's expected 



(mean) degree below 1; in £6.3, we show how perturbed phase tubes avoid being split 
by SP discontinuities at large depths; finally, in §6.4[ we show to remove the switching 
leaves and do away with the persistent graph assumption. We also explain why it is 
legitimate to ignore the non-Markovian aspect of the system in most of the discussion. 



Vanishing leaves and switching leaves are distinct: the former "cover" the chaotic regions of the 
system and are the places perturbations help us avoid; the switching leaves, on the other hand, represent 
a change in dynamics type and plug into the roots of other coding trees. 
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6.1 The thinning rate 

We prove that, as the depth of a node v of the global coding tree grows, A< v and B< v 
tend to matrices of rank and rank r, respectively, with the thinning rates 7 and 7' 
telling us how quickly. 

Lemma 6.1. Given a node v of T m ^n-m, there exist vectors Zj € M n * (i = l,...,r), 
such that, for any t v >t c = n cnto and a large enough constant c, 



_ e 

max 



< e~ 7 V 



(i) ||^4<«l m ||oo < e ltv and (ii) B< v - diag {l ni ^x , ■ ■ ■ , ln r z 
where 7 = l/t c and 7' = n~ cn . 

Proof. We begin with (i). Consider the initial state x = (l m ,0 n _ m ), with all the A- 
agents at 1 and the -B-agents at 0, and let y = -P<„x; obviously, ||74<„l m ||oo = ||y||oo- 
To bound the ^oo-norm of y, we apply to x the sequence of maps specified along the 



path of Tm^n-m from the root to v 21 Referring to the arborator (12), let's analyze the 
factor 

1 tfc+i— tfc — 1 

The wait period t^+i — i& before wetness propagates again at time t^+i is at most t a : 
indeed, by definition, any A-agent can reach some -B-agent in H via a directed path, so 
all of them will eventually get wet. It follows that the set Wk cannot fail to grow in t$ 
steps unless it already contains all n nodes or the trajectory reaches a switching leaf. 
Assume that the agents of Wt k +i, the wet agents at time t^ + l lie in (0, 1 — a]. Because 
their distance to 1 can decrease by at most a polynomial factor at each step, they all lie 
in (0, 1 — an~°( to ^] between times i& and tfc+i- The agents newly wet at time t^ + i + 1, 
ie, those in Wt h+1 +i \ Wt k+1 , move to a weighted average of up to n numbers in (0, 1), 
at least one of which is in (0, 1 — cm~°(*°)]. This implies that the agents of Wt k+1 +i lie 
m (0, 1 -an~°^}. Since a < 1, when all the j4-agents are wet, which happens within 
nt Q steps, their positions are confined within (0, 1 — n~°^ nt °^\. It follows that 



< e 



-LV("MJ™" 0(nto) 



which proves (i). We establish (ii) along similar lines. Although Bi and Bj (i 7^ j) are 
decoupled, they are not independent; so their joint coding tree cannot be expressed as a 
direct sum. The subgraph of H induced by the agents of any given Bi is strongly 
connected, so viewed as a separate subsystem, the -B-agents are newly wetted at least 
once every nt a steps. By repeating the following argument for each Bi, we can assume, 
for the purposes of this proof, that B = B\, n\ = n — m and r = 1. 

Initially, place -B-agent j at 1 and all the others at 0; then apply to it the sequence 
of maps leading to B< v (this may not be the actual trajectory of that initial state). The 
previous argument shows that the entries of the j-th column of B< v , which denote the 



21 The path need not track the orbit of x. 
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locations of the agents at time t v , are confined to an interval of length e~LW( n *°)J n O(nto) _ 
By the agreement rule, this implies that the communication subgraph among the B- 
agents must freeze at some time t c = n cnt ° for a constant c large enough, hence become 
H\ B 22 Let {ui} be the n°( ntc ^ nodes of the coding tree at depth t c . Any deeper node v 
is sucn that B< v = Q*"-*«i5< Ui for some i, where Q is the stochastic matrix associated 
with H\b. Since that graph is strongly connected, the previous argument shows that the 

entries in column j of Q k lies in an interval of length e~ kn ° <n) . Since Q k+1 is derived 
from Q k by taking convex combinations of the rows of Q k , as k grows, these intervals 
are nested downwards and hence converge to a number zj. It follows that Q k tends to 
l ni z T , with \\Q k — lniZ T || m ax < e ~ kn ° (n) . Doubling the value of t c yields part (ii) of 
the lemma. □ 

The proof suggests that, for any node v deep enough in the coding tree, the matrix A< v 
becomes an error term while B< v tends to a matrix that depends only on the ancestor 
of v of depth t c . The bifurcation analysis requires a deeper understanding of the error 
term and calls for more sophisticated arguments. We state the thinning bound in terms 
of the global coding tree for the perturbation interval I = [—1,1]. 

Lemma 6.2. Any node v of 7^_j.„_ m of depth t v > t c has an ancestor u of depth t c 
such that 

D Uj 



< e~^\ 



P<v — 

where D u is a stochastic matrix of the form D u = diag ( l ni zi(-u) T , . . . , l nr z r (u) 



6.2 Sparse branching 

If we look deep enough in the coding tree for the thinning rate to "kick in," we observe 
that, under random margin perturbation, the average branching factor is less than two. 
Bruin and Deane observed a similar phenomenon in single-agent contractive systems |9j. 
Their elegant dimensionality argument does not seem applicable in our case, so we follow 
a different approach, based on geometric considerations. We begin with some terminol- 
ogy: Lin [xi, . . . , x n ] refers to a real linear form over xi,...,x n , with Aff [ X\ , . . . , x n \ 
designating the affme version; in neither case may the coefficients depend on 5 or on the 



agent positions With yi,...,y r understood, a gap of type ui denotes an interval of 



the form o + ul, where a = Aff [yx, . . . , y r ] . We define the set 

ni n r 

■ y r ] = | ( £ , / yi, • • • , 'y r ,~-,yr ) I £ G ^ m } • 



22 We emphasize that we are making no heuristic assumption about the repeated occurrence of the 
edges of H: switching leaves are there precisely to allow violations of the rule. 

23 For example, we can express y — 8 + x\ — 2x2 as y = 5 + Lin [xi, X2] and y = 5 + x\ — 2x2 + 5 as 
y — 5 + Aff [xi,x 2 ]- 



35 



The variables y\,...,y r denote the limit positions of the P-agents: they are linear 
combinations of their initial positions £ m +i, . . . , x n (but functions of the full initial 
state x). Let v be a node of the global coding tree _^ n _ m . The matrix P< v is a 
product of the form P tv • • • Pq, with Pq = Id and Pq, . . . , P tv forming what we call a 
valid matrix sequence. Fix a parameter p > and a point x in R n . The phase tube 
formed by the cube B = x + pl n and the matrix sequence Po, . . . , Pt v consists of the 
cells Po B, . . . , (Pt v • • • Po)B. It might not track any orbit from B and hence have little 
relation with a phase tube of the actual system. The phase tube splits at node v if the 
global margin TZ £ defined in ^ makes (Pf. ■ ■ ■ Po B) \ TZ £ disconnected. The following 
result is the key to sparse branching: 

Lemma 6.3. Fix e,p > 0, D > 2 ( - 1 ^ n , and (yi,...,y r ) G W, where 7 = n~ cnto . 
There exists a union W of n°^ nD °^ gaps of type (e + p)n°^ n such that, for any 
interval A C I \ W of length p and any x G C[yi, . . . ,y r ], the phase tube formed by the 
box x + pl n along any path of '7^_ 5 . n __ m of length at most Dq cannot split at more than 

D^ n+1 nodesQ 

Proof. We begin with a technical lemma which we prove later. For k = 0, . . . , D, let 
afc be a row vector in M. m with 0(logn)-bit rational coordinates and A\. be an m-by 
m nonnegative matrix whose entries are rationals over O(logiV) bits, for N > n 
Write Vk = a^Ak ■ ■ ■ Aq, with Aq = Id, and assume that the maximum row-sum a = 
maxfc >0 H-Afcllloo satisfies < a < 1. Given / C {0, . . . ,D}, denote by V\i the matrix 
whose rows are, from top to bottom, the row vectors v k with the indices k £ I sorted in 
increasing order. The following result is an elimination device meant to factor out the 
role of the ^4-agents. It is a type of matrix rigidity statement. 

Lemma 6.4. Given any integer D > 2( 1 /« m+1 and I C {0, . . . , D} of size \I\ > 
D l -P m+1 1 where (3 = |log a\/(cm 3 log N) for a constant c large enough, there exists 
a unit vector u such that 

u T V {I = and u T l>N~ cm3D . 
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Although c is unrelated to its namesake in Lemma 6.1, we use the same constant by 
picking the larger of the two; in general, such constants are implied by the bit complexity 
of the transition matrices and the SP discontinuities. Note also that a > N^°^\ so 



j3 can be assumed to be much less than 1. To prove Lemma 6.3, we first consider the 
case where the splitting nodes are well separated, which allows for Lemma 6.1 to be 
used; then we extend this result to all cases. Given a valid matrix sequence Po, . . . , Pd , 
choose D > 2( 1 / /3 ) m+1 and pick a sequence of D + 1 integers = so<"""< s D<A) 
such that 

D > 2(W m+1 and l/ 7 < s k - s fc _! < 3/7, (23) 



The crux of the lemma is the uniformity over x: only (yi, . . . , y r ) needs to be fixed. 
25 The coefficients ah express the discontinuities. Being extracted from the product of several transi- 
tion matrices, At requires more bits. 
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for k = 1, . . . , D: we identify the matrix of Lemma 6.4 with the m-hy-m upper left 
principal submatrix of P Sk P Sk -\ ■ ■ ■ P Sk _ 1+ \; using the notation of (22), = A< w , for 



some node w (not necessarily an ancestor of v) of depth t w = Sk — Sk-i > I/7. Thus, 
by Lemma |6.1[ for k > 0, the maximum row-sum of any A^ satisfies a < 1/e: each 
Aj. is a submatrix of a product of at most 3/7 transition matrices, so each entry is an 
0(log iV)-bit rational, with N = n n ^ . What is the row vector a\P. For k = 0, . . . ,D, 
pick any one of the margin slabs and denote by a& the m-dimensional vector of 

0(logn)-bit rational coefficients indexed by the A-agents 



Lemma 



6.4 



to be of size \D 



1-/3" 
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Fix 5 E I and pick / in 
Assume that, given x G C[yi, . . . , y r ], the phase 



tube formed by the box x + pl n and Pq, . . . , P SD splits at every index of / along the 

Sk 



chosen slabsj^j In other words, for each k E I, there exist a node of depth t Zk 
and pi = Pi(k), for i = 1, . . . , n, such that \pi\ < p and 



c k + (ak,b k ) 



A<z h 




c 



D 



(Xi + pi, ... ,X n + Pn) T + ^ 



where the chosen slab is of the form \ck + afc(a?i, • • • , x m ) T + bk(x m+ i, 



+ 6\<e, 



with bk £ 



and Cfc G M. Since = akA< Zk and x G C[j/i, . . . , y r ], it follows that 



Vk{x\ + pi,...,x m + p m ) T + Aff [y 1 , . . .,y r ,p m+1 , . . . , p n ] + 5 



where the coefficients in the affine form are of magnitude n°^ l \ 



(24) 




Figure 7: The choice of slabs at the nodes causes the phase tube to split at the nodes indexed 
by / = {2, 4, 6}. The nodes of depth sj, for k g I are represented as black dots: so, si, S3, S5, S7 
(D = 7). The other nodes in the paths are the white dots. 



With m = 3, x\ — 23 + S = 0.2 gives a k = (1, 0, —1) and X2 — £4 + 5 = 0.7 produces = (0, 1, 0). 
It is immaterial that x + p I" might slightly bulge out of the phase space fl™ . 
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Lemma 6.4 allows us to eliminate the variables x%, . . . , x m : we premultiply V\j by 



the unit vector u to find that 

| Aff [ m , . . . , y r } + S | < (e + p)N°^ D ^ , (25) 

where the coefficients of the new affine form are bounded by N 0( ~ cm3D \ (We leave the 
constant c in the exponent to highlight its influence.) The remarkable fact is that the 
variable 5 is assured not to vanish during the elimination. Thus, as long as 5 remains 
outside a gap of type (e + p)N 0<yCTn D \ the phase tube formed by x + pF 1 and Pq, . . . , Pd 
cannot split at every index of /. Counting the number of possible choices of slabs per 
node raises the number of gaps to n°^". The argument assumes that 5 has the same 



value in each of |/| inequalities. It need not be so: each 5 in (24) can be replaced by 
S + Vfz (k G I), for \uk\ < p, and the new system of inequalities will still imply (25) 28 
A combinatorial argument shows how adding more gaps to the "exclusion zone" keeps 
branching low. Before proceeding with that final part of the proof, we summarize our 
results, using the bound |loga| > loge > 1. 



Lemma 6.5. Let N = n™ 2 / 7 and (5 = l/(cm 3 log A), where c is the constant of Lemma 6.4 



Fix a path in 7^_;.„_ m from the root and pick D + 1 nodes on it of depth = sq < 



< sd satisfying (23); out of these nodes, choose a subset I of size \D 1 @ m+1 



There exists an exclusion zone W consisting of the union of at most n°^' gaps of 
type (e + p )N 0( > cm?D \ such that, for any interval A C I \ W of length p and any 
x G C[yi, . . . ,y r ], the phase tube formed by x + pl n cannot split at all the nodes of I in 
Tm-+n-m (assuming they exist). 



The crux of the lemma is that it holds uniformly for all x. To prove Lemma 6.3 we 



need to extend the previous lemma to all the paths of the coding tree of the prescribed 



length and remove from (23) the lower bound of I/7 on the distance between consecutive 
splitting nodes. Fix Do > 2^ l /^ n , and let v be a node of __j. n _ m of depth t v = Do- 
Since the path is fixed, we can uniquely identify the node v and its ancestors by their 
depths and denote by Pt the transition matrix of the node at depth t. Define the node 
set J = {I/7, 2/7, . . . , Do}, with \ J\ = [7D0] ; recall that I/7 = t c is an integer. Let K 
be the set of ancestors of v at which the phase tube formed by x + pl n and Po, . . . , Pd 
splits (with respect to Tj l ^. n _ rn ); assume that 

|i^| > (26) 

We define I to be the largest subset of K with no two elements of / U {0} at a distance 
less than I/7; obviously, |/| > LtI-^IJ — 1- To define s%, . . . , sp, we add all of J to / (to 
keep distances between consecutive nodes small enough) and then clean up the set to 
avoid distances lower than allowed: we define J' to be the smallest subset of J such that 
L = I U (J\ J') contains no two elements at a distance less than I/7. Each element of 



28 This observation is crucial for the degree structure analysis to come next and the need to random- 
ize 5. 
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I can cause the disappearance of at most two elements in J for the addition of one into 
L, hence \ J\/2 < \L\ < 'jDq + 1. By construction, consecutive elements of L are at most 
3/7 away from each other, so we can identify L with the sequence si < ■ ■ ■ < sry- By 



m < n and the specifications of 7 in Lemma |6.1| and N,j3 in Lemma 6.5, we can verify 
that 



(i) D > 2( 1 /7)" +1 > 7 -l 2 (l/^r +1 +l and (ii) £,1-7^ > 2 (7jDq + 1} 



3m-\-l 



(27) 



Part (i) ensures (23). By Lemma 6.5, keeping 5 outside the union W of at most rfi^ 1 ^ 
gaps of type (e + p)N°^ m D ^ prevents I from witnessing a phase tube split at each of 
its nodes, and hence keeps K D I from being, as claimed, made entirely of "splitting" 
nodes. For this, we need to ensure that |/| > D 1 ^^™ 1 , which follows from: (26); 
|/| > [l\K\\ - 1; D = \L\ < jD + 1; and part (ii) of ([27]). 

We conclude that, as long as we choose an interval A C I \ W of length p, the 
coding tree 7^_^. n _ m cannot witness splits at all of the nodes of K (if they exist — their 
existence is ensured only in Tj l ^. n _ m ) for the phase tube formed by any box x + pl n , 
where yi, . . . , y r are fixed and x G C[y\, . . . , y r ]. Note the order of the quantifiers: first, 
we fix the coordinates y^ and the target length Dq, and we pick a large enough candidate 
splitting node set K in T^^ n _ m ] these choices determine the exclusion zone W; next, 
we pick a suitable A and then cl aim an impossibility result for any x in C[yx, . . . ,y r ]. 

i°( nD o) respectively, the 

□ 



To complete the proof of Lemma 6.3 we bound, by 2 D ° and 



n 



number of ways of choosing K (hence /, L) and the number of nodes v in 7^ 
depth t v = Dq. 



Proof of Lemma 6.4, We can make the assumption that / includes 0, since all 
cases easily reduce to it. Indeed, let / be the smallest index in /. If / > 0, subtract 
I from the indices of I to define I' 5 {0}. Form the matrix VL, of vectors v' k , where 
Vk+i = v' k Ai • • • Aq. Rewriting V\j as V^Ai ■ ■ ■ Ao takes us to the desired case: we 
(cosmetically) duplicate the last matrix, Pjj, I times to match the lemma's assumptions 
and observe that, if u T V^, = 0, then so does u T V\j. We may also assume that all are 
nonzero since the lemma is trivial otherwise. All the coordinates of Vk can be expressed 
as 0(m 2 (k + 1) log iV)-bit rationals sharing a common denominator; therefore, 

N -0((k+l)m 2 ) < nil < 2 -fc|loga|+0(logn)^ ^8) 

The affine hull of V\j is the flat defined by { z T V\j : z T l = 1 }: its dimension is called the 
affine rank of V\j. Let g(D, r) be the maximum value of |/|, for {0} C/C {0, . . . , D}, 
such that V\i has affine rank at most r and its affine hull does not contain the origin. 



Lemma 6.4 follows from this inequality, whose proof we postpone: for r = 0, . . . , m — 1, 

g(D, r) < D 1 -^ 1 , for any D > 2^ m+1 , (29) 

where (3 = | log a |/ (cm 3 log N), for constant c large enough. Indeed, given any {0} C 
I C {0, . . . , D} of size at least D 1- ^ , we have |/| > g(D, m — 1), so the affine hull of 
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Vu contains the origin. If r is its affine rank, then there exists J C I of size r + 1 such 
that the affine rank of V\ j is r and its affine hull contains the origin, hence coincides with 
the row space of V\j 29 which is therefore of dimension r. This implies the existence of 
r independent columns in V\ j spanning its column space: add a column of r + 1 ones to 
the right of them to form the (r + l)-by-(r + 1) matrix M. Since the affine hull of V\ j 
contains the origin, there exists z such that z T Vn = and z T l = 1, which in turn shows 
that l r +i lies outside the column space of V\ j therefore M is nonsingular. Since each 
one of its rows consists of 0(m 2 D log 7V)-bit rationals with a common denominator, 

|detM| > N- oi - mSD \ (30) 

Let £ be the (r + l)-dimensional vector whose k-th. coordinate is the cofactor of the k-th. 
entry in the last column of ones in M. Simple determinant cofactor expansions show 
that 



0, ...,0,det M). 



Since the first r columns of M span the column space of V\j, it follows that 



?{V\j,\ r +i) = ( 0,...,0,detM). 
By Hadamard's inequality, each coordinate of £ is at most 

n O(m) in 

absolute value, so 

straightforward rescaling and padding with zeroes turns £ into a suitable vector u such 
that u T V\i = and u T l > N~ Cltn D , for an absolute constant c\ that does not depend 



on c. Replacing c by max{c, c\\ establishes Lemma 6.4 



It suffices now to prove (29), which we do by induction on r. If V\j has affine rank 
r = and its affine hull does not contain the origin, then all the rows of Vj/ are equal 
and nonzero. Since V\i has the row vq, it follows from (28) that |/| < 1 + max{A; £ 1} = 
0(|loga| _1 m 2 logiV), hence 

g{D,o) < p- 1 . (31) 



Assume now that r > and that V\j has affine rank exactly r. Put I = {k$, ki, . . . , ki}, 
with ko = 0, and consider the smallest j such that Vjj has affine rank r, where 
J = {ko, ki, ■ ■ ■ , kj} C /. Since the origin is not in the affine hull of V\j hence of Vjj, we 
can always pick a subset K C J consisting of r + 1 independent rows: let M = V\Kuikj\ 
denote the (r + 2)-by-m matrix formed by adding the row v^. at the bottom of V\k 31 
Since V\i has affine rank r, its rank is r + 1 (using once again the noninclusion of O in 
the affine hull of V|/), hence so is the rank of M. We show that if ki is large enough, the 
system below is feasible in £ G M r+2 : 

m 

£ T M + = CoT^To*, 1), (32) 



Because any y T V\j can be written as (y + (1 — y T l)z) T V\ j, where z T V\j = and z T l = 1. 
Otherwise, 1 = z T l — z T V\j y — 0. 

It may be the case that i — j or ki £ K. Since r > 0, we have ki > kj > 1 and j > 0. 
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Figure 8: Why a large value of fcj implies that the affine hull of V[j, hence of M, contains the 
origin. 



where M + is the (r + 2)-by-(m + 1) matrix (M, l r +2)> which leads to a contradiction. 
This is the crux of the argument and makes essential use of the rapid decay of the vectors 



v k . Assume that ki > ckj\loga\ 



log N, for a large enough constant c. We first show 



that M + is of rank r + 2. Pick r + 1 independent columns of V\k, which is possible 
since the latter has rank r + 1, to form the full-ranked (r + l)-by-(r + 1) matrix Q. Add 
a new row to it by fitting the relevant part of v ki (the last row of M) and call R the 
resulting (r + 2)-by-(r + 1) matrix (Fig. [8]); consistent with our notation, R + will denote 
the matrix {R, 1). A cofactor expansion of the determinant of i?+ along the bottom row 
shows that 

|deti2+| > | det Q | - A||iy|i, 

where A is an upper bound on the absolute values of the cofactors other than det Q. In 
view of ( 28 ) , the matrix entries involved in these cofactors are all in ; by Hadamard's 
inequality, this shows that we can set A = . Likewise, we find that 

\\ Vk < 2 _fc il lo s a l+°( lo g n ). 



Since Q is nonsingular, we can adapt (30 ) to derive |det Q\ > N~ 0<ym kj \ hence |det > 
0. It follows that the linear system (32) is feasible if we replace M + by R + . As it hap- 
pens, there is no need to do so since every column of M missing from R lies in the 
column space of the latter: thus the missing homogeneous equalities are automatically 



satisfied by the solution £. The feasibility of (32) contradicts our assumption that the 
origin is outside the affine hull of Vj/; therefore 

kj > (3ki > 0, (33) 

where /3 = | log a|/ (cm 3 log N). The affine rank of V[{jfc 0j ... i /t _ 1 } is r — 1 and its affine hull 
does not contain the origin, so j < g(kj_i,r — 1), with g(0, r — 1) = 1. Let wq = and, 
for k > 0, w k = a kj+ kA k:j +k ■ • ■ A kj +i, thus ensuring that v kj+k = w k A kj ■ ■ ■ A . Since 
the affine hull of V\j does not contain the origin, neither does that of the matrix W with 
rows wo,w kj+1 - kj , . . . ,Wki-kj- It follows that the affine rank of W is less than m, so 
i — j + 1 < g(ki — kj,m — 1), hence 32 i < g(kj-i,r — 1) + g(ki — kj, m — 1) — 1. By (33) 



32 It would be nice to bound the affine rank as a function of r, but since we never perturb the 
transition matrices it is unclear how to do that. 
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and i = \I\ — 1, we derive, by monotonicity, 

\I\ < g(k,r- 1) + g(D -k,m- 1), 



where j3D < k < D; hence, by (31 ), for m > and D > 0: 
g(D,r) < < 



1 if D = 

/r 1 if r = 

k g(ni,m — !) + •••+ g(n r ,m — 1) + /3 _1 if < r < m, 



where n\ + ■ ■ ■ +n r < (1 — f3 s )D, with s = \{i\ni > 0}\. Setting rj = (3 m , we check that, 
for all D > and m > 0, 



m - 1) < p-'^lD 1 - 71 - 1). 



(34) 



The case m = 1 follows from g(D,0) < j3 . For m > 1, we begin with the case s = 0, 
where 

g(D,m- 1) < m - 1 + /T 1 < /3 _2 (2 J D 1 ^ - 1). 

This follows from a > N^°^\ which implies that /3m 3 can be made arbitrarily small 
by increasing c. For s = 1, 

#(A m - 1) < /T 2 (2(l - / 3) 1 -'' J D 1 -'' - 1) + m - 2 + /T 1 

< 2/3- 2 D 1 - r ' - (2/3- 1 (l - r?) - 0(l))D 1 - ,? - /3~ 2 + /T 1 + m - 2 

< /3- 2 (2 J D 1 - ,? - 1). 

Assume that s > 1. Being concave, the function x i— > x 1-11 is subadditive for x > 0; 
therefore, 



n 



i " + • • • + ra^ < (1 - /3 S ) 1 -" J D 1 -". 



Setting r = m — 1 , relation ( 34 ) follows from the inequality, 

g(D, m - 1) < /T 2 (2(l - ( 0*) 1 -'?-D 1 -') - s) + m - s - 1 + /T 1 

< 2/T 2 (l - j g™-i)i-iJ£)i-'J _ 3^-2 < 2/3- 2 J D 1 - ?? - /3" 2 , 



which proves (34), hence (29) and Lemma 6.4 



□ 



6.3 The degree structure 

We decompose the global coding tree into three layers: the top one has no degree 
constraints; the second has mean degree less than two; and the third has no branching. 
Consider a placement of the B- agents, such that the diameter of each Bi is less than 
n~ bn . By the agreement rule, the communication subgraph induced by the 5-agents is 
frozen and its transition matrix Q is fixed and independent of the particular placement 



42 



of the .B-agents By Perron- Frobenius, or simply by repeating the proof of Lemma 6.1 



we derive the existence of a rank-r stochastic matrix 

Q = diag(l ni xf , . . .,l nr Xr ) 
such that Xi G R"" and \\Q k - Q\\ 

max < e ^ n . The -B-agents find themselves 
attracted to the fixed point y = where £ G IR™ - " 1 is their initial state vector and 

y = (yi, • ■ • ,yi, ■ ■ ■ ,y77~^~l£)- 

Define T = ST™ x (n n ~ m nT B ), where 

T B = y + (n^ 2fen F" m ) n ker Q. 

If x G T, the diameter of any group Bi is at most 2n~ 2bn < n~ bn so the communication 
graph induced by their agents is frozen and remains so. The -B-agents are attracted 



to y 34 This follows easily, as does the next lemma, whose proof we omit, from the 
stochasticity of Q and the identities: QQ = QQ = Q 2 = Q. 

Lemma 6.6. The set T is forward-invariant. Furthermore, any £ G y + n - 2bn j n - m 
belongs to T b if and only if Q£ = y. 



We set e, p, Dq as in Lemma 6.3 and call an interval A free if it does not intersect the 
exclusion zone W = W(y). As usual, we choose the perturbation sample space n~ b I to 
make perturbations inconsequential in practice. For counting purposes, it is convenient 
to partition n~ b I into canonical intervals of length p (with possibly a single smaller one). 
A gap of W can keep only (1 + e/ p)n ^ n ° Do ^ canonical intervals from being free, so the 
Lebesgue measure of the free ones satisfies: 

Leb | (J free canonical intervals } > 2n~ b - (e + p)n° (n5Do) . (35) 



Fixing the S-agent attractor. With y fixed, we pick a free canonical interval A 

AIT 

and focus on the global coding tree T m L>. n _ m , with the superscripts indicating the 
perturbation and phase spaces, respectively]^] For any node v of depth t v > t c , the limit 
matrix D u in Lemma 6.2 is the same for all nodes u of depth t c . Indeed, 



C v 
Q 



< e 



The system under consideration is the non-Markovian extension defined by the persistent graph H. 

34 Although the _B-agents in T b have been essentially immobilized around y, they are not decoupled 
from the rest. Indeed, while the increasingly microscopic movement of the £?-agents can no longer 
affect their own communication graph, it can still influence the communication among the A-agents: 
furthermore, this may still be true even if no edge is ever to join an ^4-agent to a B-agent. 

35 The reason we do not fix the perturbation 5 is that it needs to be randomized and it is easier to 
avoid randomizing the coding tree itself. 
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Pick v of depth t v > 3t c and let w be its ancestor at depth t w = [t v /2\. Given 
x G U v C T, 

x' = /*» (x) =V£ f ^ (x m+1 , . . . , * n ) T + ne"^ F 

C«,(x m +l, . . . , X n ) T \ _|_ ne -7t™ 



By Lemma 6.6, x' E T, so there exists a node u' of depth t v i = t v — t w >t c such that, 

/*»(x) = /'"'(x) = P<„,x' G ^'J (y+ne" 7 *- I")+ne- 7 '"' I n C (^ v ^+2ne-^ /3 I n . 

It is important to note that v 1 depends only on v and not on x G U v : indeed, the phase 
tube from U v between time t w and t v does not split; therefore f tw (U v ) C U v >. It follows 
that, for t v > 3t c and v' = v'(v), 



V v C ^ y ^) +2ne-^/ 3 r. 



(36) 



The A- agents evolve toward convex combinations of the i?-agents, which themselves 
become static. The weights of these combinations (ie, the barycentric coordinates of 
the ^4-agents), however, might change at every node, so there is no assurance that the 
orbit is always attracted to a limit cycle. The layer decomposition of the coding tree, 
which we describe next, allows us to bound the nesting time while exhibiting weak yet 
sufficient conditions for periodicity. 




a 



Figure 9: The global coding tree is stratified into three layers, with decreasing branching rates. 



To stratify the coding tree T m [^ n _ m into layers, we set up three parameters Dq, D±, and 
D2: the first targets the topological entropy; the second specifies the height of the first 
layer; the third indicates the nesting time. We examine each one in turn and indicate 
their purposes and requirements. 
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First layer. By (36), the phase tubes get thinner over time at a rate of roughly e~ 7 / 3 , 
while the tree is branching at a rate of . To ensure that the topological entropy is 
zero, the product of these two rates should be less than 1: with 7 < 1, this is far from 
being the case, so we need a sparsification mechanism. This is where Lemma 6.3 comes 



AIT 

in. Indeed, deep enough in T m _^ n _ m , the size of a subtree of height Dq is at most 



36 



1-7 



n + l 



while the tubes get thinner at a rate of 2ne~^ D °^ 3 for every consecutive Dq nodes: the 
choice of Dq below ensures that the product is less than 1, as desired. We justify this 
choice formally below. 

Dq > 2^^^ [ Dq big enough for thinning to outpace branching ]. (37) 



Second layer. Technically, Lemma 6.3 addresses only the branching of the phase 

tube formed by a small box x + pl n , for x E C[y±, . . . ,y r ], whereas we are concerned 

AIT 

here with phase tubes originating at some cell V v of T m _>. n _ m . To make V v thin enough, 



we choose a node v deep in the treej^J By (36), V v C x + pl n , for x E C[y\, 
provided that t v > D\ and 



3 , 2n 
D x >- log — 
7 P 



Di big enough for tree branches to be thinner than p 



(38) 



Note that the requirement in (36) that t v > 3t c = 3/7 is implied by t v > D\. In view 
of Lemma 6.3 
bounded by 



AIT 

the number of nodes in T m \ tn _ m of depth no greater than t > D\, is 



n + l 



n 



OinDt) x n O(nD^ l(t-D 1 )/D }) 



7 O(nD ) 



depth Di 

hence, for any t > D\ 



^ x v—"^ x Dq ; 

from Di to t in chunks of Do truncated chunk single paths 



{vGT n 



A|T 

m — > n—m 



t v <t} 



< O(nD +nDi+ntD Q ' 



(39) 



This assumes a thinness condition we discuss below. The factor Do comes from the nonbranching 
paths in the subtree spanned by the phase tubes from T. 

37 Factoring out the B-agents gives us the sort of fixed-point attraction that is required by Lemma 
it is a dimension reduction device in attractor space. 



6.3 
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Third layer. The bottom layer of the stratified global coding tree begins at a depth 
D2 > -Do + D\. If the node v of depth t v > D2 has nontrivial branchingPM then, by 



continuity, V v contains a point right on the boundary of the global margin. By (36), this 

implies the existence of £ G M. n such that 1 1 CI 1 00 < 2ne~ 7 D2 ^ 3 and Aff [y + Q = 5, where 

the coefficients of the affine form are of magnitude and depend only on the node 

J ~i A'lT 

v. It then follows from (|39|) that T m \ n _ m has no nontrivial branching at depth D2, 
provided that A' 
at most 



A \ W' , where W consists of gaps of type n°^e "< D2 / 3 numbering 



n 



O(nD +nD 1 +nD 2 D 



) 



n 



O(l) 



# nodes at depth D2 # margin slabs 

This calculation, in which e played no role, puts a bound of D2 on the nesting time. It 
follows that 



Leb (W') < e 



-y D 2 /3 n O{nD +nD 1 +nD 2 D -' 



) 



Pick an arbitrarily small e Q > and a large enough constant d 



1 



n 



-cnt 



We set the parameters p 



e 2 n dn5 °o and e < min{/j, e l£>2 



(40) 

d(b, c); recall that 
}, where, 



rounding up to the nearest integer, 

D = 
Di = 
Do = 



2 d(i/ 7 ) 

d 2 ( m % 



n+2 



(n b D + |loge o | 
n 2 D 1 + |loge |) 



(41) 



We verify that conditions (37, 38) are both satisfied and that 



(42) 



Thus the measure bound ( 40 ) implies that Leb ( W) < p2 . By making e tend to 0, the 



point x vanishes with arbitrarily small probability for random 8 € A'. By Lemma 4.4 
this implies that, with probability at least 1 — 2~ D °, subjecting the system's margin 
to a perturbation 5 chosen randomly in A makes the orbit of any x € T attracted to 
a limit cycle (or a switching leaf)]^] we call this success. The sum of the period and 
preperiod is bounded by the number of nodes of depth at most D2 (the nesting time), 



which, by (39, 42), is no greater, conservatively, than 



V 



n O{nD x ) < ( 1/£o )0(7- 2 ) 2 Do7- 1 



t o(i) 



(43) 



We bound the attraction rate by appealing to Lemmas 4.5 and 6.2 Note that if p is 



the period then so is p[(log 2n)/ r y] . This choice of p still satisfie s the upper bound (43) 



while ensuring that, at every period, the error bound in Lemma 6.2 is at most The 



A node is branching nontrivially if it has at least two children neither of which is switching or 
vanishing. 

39 The regions W and W' , which make perturbation a requirement, depend only y. But perturbation 
is also needed to avoid vanishing, which depends on the initial state x. 
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row-sums in A in Lemma 4.5 are at most 1/2, so we can set \i = e 7 . Since v < D2 and 
(#5 l/ )[(log2n)/7] < p, it follows that 



< 



^0(A 1 ) (1/eo) 0( 7 -) log i ! 



(44) 



for any < a < 1 /2. The perturbation space is not A but n~ b I, so we apply the previous 
result to each free canonical interval and argue as follows. If A is the measure of the 
union of all the free canonical intervals, then the perturbations that do not guarantee 
success have measure at most (2n~ b — A) + 2~ D °A. Dividing by 2n~ b and applying (35) 
shows that 

Prob [failure in T^n-L ] < 1 - (1 - 2- D °){ 1 - (e + p)n°^ D ^ ) < 2 l -°° . (45) 



The nesting time is at most D2 , which, by ( 39 42 ) , implies that 



h(T n 



n- 6 I|T ■ 
m — > n—m. 



< 0{Dm log n) <UDq + |loge |)n 



O(l) 



(46) 



Freeing the i?-agents. Set D3 = 

projectio n of f Ds (x) onto the last n 
(Lemma 6.1), the coding tree T n 



[36i c nlogn] and fix x in Q n . Let £ denote the 

and t c = I/7 



6.2 



n-»I 
m — > n—m 



€ y + ne 



m coordinate axes. By Lemma 
has 7i°( n * c ) nodes u such that t u = t c and 

y + n 



-7D3 jn-m (- _j_ n ~26n jn-m 



where y = D u (x m +i, ■ ■ ■ , x n ) T . The state vector for the -B-agents is ^ at time D3 
and Q t_£>3 ^ at t > D3, where Q is the transition matrix of the frozen communication 
subgraph joining the B- agents at time D3. By taking t to infinity, it follows that y = Q£ 
and, by Lemma 6.6, £ G Y# hence / Ds (x) € T. We can then apply the previous result. 
Since x is fixed, only the choice of random perturbation 5 can change which path in 
Tm-^n-m the orbit will follow. The failure probability of (45) needs to be multiplied by 
the number of nodes u, which yields an upper bound of n ^ C -'2 1 ~ D °; hence 



Prob [failure in T^Lm] < 2~ D ° /2 . 
If T* denotes the part of the global coding tree extending to depth D3, then 



(47) 



' m — > n—m 



T* <g> %, 



[ I T 



m — > n—m' 



The tree T* has at most nP^^D^ nodes; therefore, by (46) 
KT^Zl-m) < 0{t c n\ogn + \0gD3) + h(T n 



i\r v 

' m n—m) 



< ±(A) + |loge |)n 



O(l) 



(48) 
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6.4 Removing persistence 

We use direct products to relax the condition that the permanent graph H be fixed 
once and for all. This touches on the non-Markovian nature of the system, a feature 
we chose to ignore in the previous section. We explain now why this was legitimate. 
Because the switching condition is about time differences and not absolute times, any 
subpath in the coding tree has an incarnation as a path from the root. Equivalently, 
any interval of a trajectory appears as the prefix of another trajectory. This property 



explains why, following (23), we could argue that was of the form A< w . Likewise, 
in the derivations leading to ((361), we used the fact that /*"(x) = f tv ~ tw (f tw (x)), an 



identity that might not always hold in a non-Markovian setting, but which, in this case, 



did. Finally, weren't we too quick to appeal to Lemma 4.4 for periodicity since its 

proof relied heavily on the Markov property? To see why the answer is no, observe that 

the argument did establish the periodicity of the "wrap-around" system derived from 
A' IT 

T m \ n _ m by redirecting any trajectory that reaches the nesting depth to the root. The 
only problem is that this system, being Markovian, is not the one modeled by the coding 
tree. Wrapping around resets the time to zero, which might cause switching conditions 
to be missed and trajectories to be continued when they should be stopped: none of 
this stops nonvanishing orbits from being periodic, however. 

We now show how to relax the permanent graph assumption. The idea is to begin 
with H set as the complete graph and update it at each switching leaf by removing the 
edge(s) whose missing presence causes the node to be a switching leaf. We then append 
to each such leaf the coding tree, suitably cropped, defined with respect to the new 
value of H. We model this iteration by means of direct products, using to denote 
the number of ^4-agents in the block-directional system used in the k-th product: 

T n-H /Ov T n~H f aq\ 

'n r \£y 'm k ^n—m k - \^ U J 

k=l 

The upper limit ko is bounded by n(n — 1); note that each decision procedure Qij 
needs it own counter. To keep the failure probability small throughout the switching 



of dynamics, we need to update the value of Dq in (41) at each iteration, so we define 
Ck as its suitable value for a persistent graph consisting of k (nonloop) directed edges 
and let <f>k denote the maximum failure probability for such a graph: C n (- n _]\ > -Do an d 
0o = 0. The logarithm of the number of switching leaves is at most the word-entropy 



10 



by (|47||48j), for k > 0, 

A < 2~ Ck/2 + 2^ 1 (c fc +|io g£o |K^ 
for some constant a > 0. Setting 

C^n-ij-j = \ry- j n 2aj (D + 3|loge c 



No two switching leaves can have the same ancestor at a depth equal to the nesting time. Because 



we can bound the number of switching leaves, we may dispense with ( 10 I altogether 
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for j = 0, . . . , n(n — 1), we verify by induction that (f>k < 2 1 c *' 2 , for k = 0, . . . , n(n— 1); 
hence, 

Prob [failure in nonpersistent 7£^S_ m ] < &,(n-i) < 2 1 ~5( D o+3|iog £o |) < £q _ 



The attraction rate is still exponential: using (44) yields a geometric series summing 
up to 

a < C° {C °\l/e )°^ log J < O„, £o ,* (log I), (50) 
for any < a < 1/2. By ( |43[ ), the period and preperiod are bounded by 

(l/e )°^2 c ^- ln ° W < (l/e )°-«<>.*»M 
which completes the proof of Theorem □ 
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